Working Groups

Theresa -- GIS GROUP

Gis Group needs more people
GEONIS needs a service agreement

Don -- SENSOR GROUP

Smart Forest
Sensor Networks
Streaming Data webpage- who has?-- Wade, Sven, Phil, Margaret, :: add more ::
Where are they now

John -- PASTA GROUP

better search options
METADATA organization
too many results in some cases
first all site searchable data

Inigo -- DRUPAL GROUP -----ason

code commits for the master branches of DEIMS
special ones for each group, like mcmurdo, ntl :: add more ::
CSV v. EML etc.
Training? - last year Taiwan too far? but this year, 3 sessions
Development, training, migration and adoption

Wade -- DATABITS ISSUE

what can we add
next databits editor

POSTERS / WORKING Groups

Tuesday -- 3D scientific visualization, Long's Peak Diamond West, 4:00-6:00
Wednesday -- 1:30 to 3:30 Mountain West East, Ned Garner, Visualization Workshop
Tuesday -- 8:45 AM Ned's Talks
Tuesday -- 10 am, DEIMS talk - Phil
Wednesday -- 4 pm, Migrating your site to DEIMS
Tuesday - 1:30-3:30 an exploration in LTER Large Data -- "big data"? Haha.
Wednesday - 8:00 to 9:45 - GCE Sensor Toolbox usage, opportunities to get deeper into it, needs assessment
- data 1 and discovery problem (is this like SEO?) Semantic web features
- new data center ad hoc group

JAMES HAD A MOUSTACHE AS A BABY

Phil -- (leading discussion) IM going forward with new structure

provide a well documented set of core network information management serveices that support data preservation and re-use
lower the barrier to adoption
etc. :: add more ::
physical entity or more PI's
governance / operations committee
administrative functions are needed
Operations Committee:
- NISAC, IM EXEC, etc. committees will have overlap
Project management:
- The work that we are doing stays on track
- Management Processes - clearly defined. See flow chart?
- Data processing and discovery
Define a basic workflow
- input from the IMC
- Science colleagues
- Executive Core helps us to spend money in a way that makes sense to all involved
- Project selection/ prioritization process
Communications office collaboration?
IGERT grant for collaboration of broader sciences with IM's
NSF funding -- extension from Saran?
Can we move faster than that?
Theresa questions about meeting and funding :: could not hear well please fill in ::

Breakout Topics

service framework, core and additional, creative activities
strawman process for science input
committee structure, interactive relationships
technical infrastructure
budget developement

In the new structure, how will we define our relationships so that it is not un-balanced.

Peter Groffman came by

Margaret -- technical interactions for on and off campus
UVA, Wisconsin, etc. interested.
Rules on the distribution of the money
Accountability from the ESIP PI?
Could not do this from office -- could not use ESIP infrastructure.
If ESIP will facilitate the science, it would be good.
Many lunches?

Communications office

NCEAS
planned not as an inheritance of current components
scientific programming support
continuity
science working groups -> synthesis working groups and education
full time communication sperson
EcoLog - best established list serv

LISTSERV@LISTSERV.UMD.EDU

body message "SUBSCRIBE ECOLOG-L"
Matt Jones
Director of computing
Collaborative versioning and training the importance of code as a scientific product
Inigo : how is the communications office going to help?

Group 4 Notes: Begin with overview, then some less organized points

Start with brainstorming - Cor

Itemization of what has worked, and what we can continue and expand like Design working groups, IM working groups, EML mentor, GCE Working groups- MCM

Thematic data centers, centers of excellence around various data themes, streamflow chemistry, working groups that could address that specifically so that the datasets can be named and structured in a way that they could be more easily pulled together at one site and people map in somehow - Don

A synthesis theme that the scientists would like? Lay the foundation for better described data that scientists could actually use - Don, Corinna, Emery

Corals of the Future - Science Synthesis working group that used the Moorea Model and tried to apply it to other sites - main suggestion was making a consistent data product modelled after the LTER data on Coral. -- MCM

Scope of data products - how can we get to a data product that is more flexible? Develop base level products that you can do analysis and synthesis that fits with the needs -- MCM, Corinna, Emery, Don

Best practices for how to do the stream data (or any kind of example data) - Climate best practices, Stream best practices, etc. - use the recommendations. Not complete uniformity but at least consistency across the measurements - Emery

NCEAS - first practice is to say what data we have and what condition that it is in. Some PI's don't know what they have, but others are able to provide lovely data on a link. Hardest thing is to keep people from bottlenecking on data that already exists. Scope is narrow for some things like coral reefs; scope is broad for things like stream chemistry. -MCM

Scientists will always want much more than we could provide - general consensus

Put the data into a public repository - EML, LTER, DataOne, CUASHI -- prexisting schema may help. Funding of people who do this ran out. We need documentation of the labs, methods, significant digits, detection limits, etc. If that was out there, sites could fit into it better. Attributes could be standardized, but right now they are not. Alba, etc. spent a lot of time trying to clean those things up. A real data component to a science synthesis project -- Don, MCM, Corinna

NCEAS requires every science working group to be involved with their programmers and the liason -- do they actually have a lot of products.

Will NSF Fund this? VEG-E (no), Clim-DB (yes)-- Corinna, Don, Emery

In theory....

we have a database of high value, scientists would approve it, how to get the data into the database and have people use it?
a funded representative helps to pull the data into the database
is there another way?
taking the idea of a center of excellence a little further and actually managing the data for all the sites -- not using the local IM at all but rather processing all the data for all the sites -- trade off the data into a new gear
direct communication between external IM at the excellence center and the field techs. It has potential to be more efficient. Have best practices sessions first.
Full range of variability that is at all the sites. Output final structure outputs. Don (or theoretical excellence person) designs the output to fit whatever is best.
What is the currency?
Data types:
- Streaming Data
- Organism Surveys
- Climate
- "Critter Counts"
- ITIS descriptions : taxonomy changes faster than ITIS?
sites might be not wanting to give up control
best practices - sites would develop as well as they could - a person can move between the sites to make a nice synthetic data set. Maybe updated every so often.
Could we have flagship products which are live and they update themselves
Scientists pass it over to the data scientists

Group 4's conclusions

3 ideas : we don't really know which would work, but it might cost some $ for these, and we don't want to ask the $ for this

trading of units of work
synthesis person on a specific topics
center of excellence for a particular topic

What do we do if a site has something to offer but it doesn't fit the need of the other sites?

Can we send currency to other sites? DEIMS center? Web center of excellence? Web space for everyone?

Programming services -- Wade is very generous, people are using his product. PASTA-prog is helpful but it could be overwhelming.

What does being a network mean?

same descriptors for everyone
people worried about interest in data or stealing data
NSF cares that data is accessible, available, and can be easily used
NSF slow to recognize data management is important
would need data manageres to visit NSF
working in a federated fashion?

Groups overviewed what they found (1-5). Theresa, Gastile, Inigo, Phil, :: add names ::

Afternoon -- Corinna, Don, Me, Gastil, Mark from NCEAS, Emery, :: add names ::

how can we make environmental data management more efficient
workflow development
how can we efficiently use the tools which are already out there
done vs. perfect

More efficient vs. better data product
Resistance of turning the data over to another data product to process
Mark - scientists don't even know that a lot of data models that are important, it's quite possible that researchers at a particular site don't even know that this model is good
LTER does not have a catalog of expertise
Currency may be really lopsided - training is critical in those cases
Bitcoin

what are good data products and what are useful data products?
xml, knb, PASTA, sites each doing things their own way is probably not as good-- synthesis working groups will not want to synthesize these groups
nut-net and drought-net -> standardized sampling is not really an option for us
can we ever get to the place where things just flow easily into a database?
adding more work to what we already doc
mapping would be watched over
The larger issue is making the data at the site discoverable and integrateble for the synthesis efforts
new data and legacy data -- may need to be prioritized -- should we get the data in or work on how we get it out.

Common schema and common syntaxes for expressing our managements and structures - now what?

site contributes?
site does not contribute?
person working there

Corinna<- great question, HOW CAN WE MAKE INFORMATION MANAGEMENT MORE EFFICIENT?
who pulls the data together and processes them?
analysts?
information modelstaers coming in, scientific programmers
expert in some kind of data is the one who works on that kind of data on behalf of a subset of sites that have that type of data
HDF/NetCDF :)
Recommend this directory of specialty:

"Specialist Services" : each site has a person who is an expert in one type of data who promotes an explicit model with certain data types.
Training?

Later afternoon meeting

MOU
working with data 1
incentive to have own site would decrease over time
ways to make information management more efficient
was a little late

Re-meet-up

Theresa --> services group, lots of documents, added some to those, use the GEONIS as a template from which other services can involve.

John --> Training, graduate students, not only contribute data but use the data we are providing, the role of data quality and metadata quality. (Wade notes we need a way to evaluate quality)

Phil --> We talked more about NISAC. Who said we have to attend all three days of ASM -- 2 day IM meeting? Proposal team (small, not know how to make), review team (one person from each group)

Emery --> work more efficient and streamlined, and think about working with groups outside of LTER- data1, earthcube, datalink, esip, etc. Learn about what they are doing and think about how to frame the proposal in a way that might extend beyond the LTER. We also want to work with NCEAS and clarify what the relationships will be between our office and theirs.

Margaret --> present our decision to them like a faculty search would be. A position description, our decision process, and why that person is a good fit. Four volunteers from EB want to review our proposal. 2-3 weeks between iterations? We might have a draft by september.

Sites may want to participate in selection of the PI - we might create a document we can share with them. Phil and Theresa and Corinna would help to put it together. They would distribute it this week.

Template to the sites by the 4th. By the 17th, submit a mini-proposal from each site (will fill in the template)

Margaret's session on: Semantic web and NPP Data

NPP data - how to get
Data ontology we should explore
"increase terminological rigor in the sciences"
etimology of certain words important
there is some physical reality that is independent of the mind and we want the words to actually describe it - concepts or terms
Big Data = Volume, variety, velocity
Lehman and Tilman - so many definitions of the word "stability" -- how do we compare conclusions amongst many?
how do we capture the notion of forests
Simple semantics SKOS
we need a better conclsept of terms at the dataset level
URL : uniform resource locator, static, rendered, pointers to places. If you're on a page let's talk
- could be called IRI's to represent international
- URI's : abstract that notion of location to identification
- URI's tell you about something at that end point - an identifier that is a "global identifier"
- Identify some resource. Any place on the web you can find.
- Persistance : identifiers that stick around for a long time
- what if we use the label like "is part of", "preys upon", "generates"
  
  IRI: passenger pigeon IRI: hasConservationStatus IRI: extinct
set theory:
- difference between instances and subclasses - important to know
- The wine is the class, the data about the wine is the subclass, the bottle is the instance
Ontologies based on web standards are really where want to be based on the technical standards over the web.
what sort of structures are really useful to make something better than simple prototype ontologies
LET'S READ THE W3C STANDARDS TOGETHER!!!!
owl
protege
annotation : binding a concept to instances of that concept in your data
semantic annotation - the scientist provides description about what information means
ecological cycling concepts - how to build an ontology - download an ontology editor such as protoge and use it to make a good system

contacts

mark schildhaur
margaret o'brien

Protege tool

can be used to make an ontology paper - puts in sources and stuff, you can annotate the column or the whole dataset with that concept, and you'll be saying that the data set adheres to the definition given
adopting the SKOS ontology
we included some connection information. Everyone should have an orchid ID or something else similar.
Term: "knowledge modeler"
Test queries - improve discovery -- build up a test corpus of the matches to the queries
Precision : you are getting what you want;
Recall : you are getting back a lot of cruft
semantics - how to broaden that search?

Eda brings up links between the DEIMS and Ontologies

LTER SKOS Vocabulary
Consensus can be avoided if you put in a "fete comple" -- "uhm you did a lot of work but we can't do much better"
a nice, polished product that has utility for the scientists
ANPP means annual or aboveground net primary productivity
annotation properties: just "tags"? Not really deeply searchable
what is the binding - like if you bind to carbon, does it bind only to that level or does it also bind to the things above it?
measurements is the focus right now
import the subtrees of the ontologies that you need
example is "lignin" -- a behavior ontology
tagging is the formal semantics

DRUPAL Workshop

two people from UGA/GCE
using DRUPAL but don't know much
interested from Oracle perspective
David Blankman - the EML evangelist - woods hole, FCE, etc.
good for managing personnel and for managing content
bibliographic content fits in nicely
do not do a lot of the editing in drupal itself
how does new data get into DRUPAL?
forms are a pain in the butt

Very lovely streams in McMurdo:

Here is a general link to the McMurdo streams site: MCM

Inigo has many options to do things with forms. A user doesn't see these forms. User goes to the new draft form and them moves from the display of the dataset to its form view. Dataset ID is part of the NIS system in the Pasta.
Editor can paste stuff straight from word.
content or ancillary. Each tab has different aspects of the data. You can add things like data sources, personnel, etc. The custom things in github can be special categories.
i.e. in McMurdo longterm v. shorterm.
very long maintenance field can be editted to hold history.
seems like a very good tool for linking people with data. in McMurdo the lead PI is usually the one listed.
selectable names in the database - this seems friendly.
methods : shows how we got the data in there
pull down of the core areas - actual LTER 5 core areas - can label the data set with different templates. Helps to create automated views and lists of things. CORE ---> THEMES ---> SITE SPECIFIC; linked to LTER controls made by John Porter.
put in the dates - 2015, etc. What happens if you put in a bad date format?
related information / relevant links
papers that result from the dataset - ones that are on your website- they are not referenced from an external source.
geo reference - can do all kinds of geo data -- string, shape file, different workflow puts these in so you don't have to worry that a stream is a point. renders the data set. Very good.
has its own CMS system built in, and includes an email re. commitments. When the moderation state moves to published its okay.
good data gathering workflow -- this is a reasonable system for establishing that.

Most of Drupal is forms--
physical data, you upload it, it's BIG!
you can remove and replace the new data set, and upload it there. There's a ton of meta data you can add in.
you can store the delimiter
connectors to databases
your data database is separate from your drupal database
painfully rich metadata
discharge in L/s at MCM ? CFS at HJA. Hmmm
data explorer will let you query that special database using the native variables; builds a form that exposes teh variable and allow users to filter on that variable...
date times must match, discharge rates must match

how does this work:

somehow this never gets hacked!
inigo hacked the core of the mysql driver to keep the date space time titles to fix this. wow. amazing!
here is a page with the data sets and summaries, plus links to data explorer
content page from mcm
populates the name and variables
custom deims makes lovely images - inigo made a view in a block and put it into google images as a kml. really super awesome - renders a map with the polygons from the spatial data
d3 js used - nothing has to be written, reads right off the database. jquery/ajax for graphs. biblio module.
responsive web design has "priority columns set up"
accessibility thing
invisi-mail, here's a form field obfuscator
The Book of DEIMS has all the interesting ways to install and use. Also it has "wizard". Clone from GitHub.
Why is Drupal faster on Windows than nice Ubuntu?

Queue of things to do

Improve and well document how to use project.roles <- this is something that could need fixing.
Improve personnel - metadata provider - front end

Morning Plenaries

Ned - Theresa's friend, data visualizations. Information manager from Coweeta.

Climate data
area of global change science where the models are great and the communication is important
originally worked in biodiversity -- LTER is his foundation
information to knowledge
REACHING PEOPLE WHERE THEY ARE!
the arctic - climate action plan - ned's team supported this with his toolkit
working actively with NASA
the trajectory of the Artic was a real concern. Really cared about sharing that message. color and "age" of ice
prioritzing outreach
data curation
data access (especially satallite)
sharing code
A UNIQUE AND VERY POWERFUL CONCEPT
really lovely data cover visualizer. Show people how remote sensing, multi-spectral systems work. Made a nice visualization that allows scientists to make maps that are all over the world. deconstruct how maps are made
biodiversity theme at the museum - bioviz
people who are non-scientists don't take in the information the way we do. "when people who are non-scientists see shifting colors on a map they say, 'oh, shifting colors on a map, okay'"
researchers out of yale - greenhouse gases read experiment
storytelling
data.gov
visualizations embedded within narratives can help people understand and build relationships
"watch the story" ==> simple visualizations can be VERY POWERFUL
FOX: This guy is my inspiration. I must be this Ned Gardiner guy.

Challenges and Solutions Relating to Storage, Managing, and Delivering of High-Resolution data

What is big data?

share our ideas and experiences with storing and managing and delivering large data fields
services we might need at the network level?
big data set relates teh use of advanced methods like modeling to extract value from Data
- large, complex, or both
Ecological Systems Theory --> very complicated lots of stuff
how large is large?
reference list of existing resources for storing large data files
how large is large?
is there a constraint to the file size we can use in synthesis processes?
is the whole more than the sum of its parts?
pasta can't handle our really large data sets -- what do we do?
transfer of data as cost versus storage as costs!
How to plan for giant data?
- talk to perspective PI's and Scientists
- Raid arrays etc. lose storage capacity, doubles every year and a half to two years the capacity for storage
- host institution can help to add space
- network capacity to transfer the files is limiting
I would suggest this: https://www.chameleoncloud.org/docs/user-guides/openstack-kvm-user-guide/
budgets and funding cycles
- IM's must compete with scientists?
- Oregon state shares costs
- Pointer to pull down the large data
- Expanding using NAS HDD
project forms?
part of project planning phase?
other?
Amazon S3
it's expensive if you are transfering
Inigo is using the cloud

methods

downloads
do it in the cloud
torrent
tarballs
tapes / flash disks
mp3s
Map and image services as separate services
D-dupe "cyclic redundancy checking - makes sure there's not 600 copies of the same data on the server- everyone else gets just a pointer to it."-- really great approach. spatial data area is available to everyone, people are not making multiple copies of the same data.

Judy Cushing and Susan's talk on VISUALIZATION!

Did not get as many notes as I'd like because it was very interesting so I was sort of in my zone. Will be getting copies of notes from participants.

3D visualizations

back transform space into image space
watched video
Theresa - fusion, watershed 1,
point cloud lidar
Andrews LiDar - big data set many folders
profile of stream or trees
Bob - VELMA, Vistas - overlay, new technology is on the way!
stream flow vs. soil moisture v. other drivers
Envision and models - Allison
techonology- Chris

Wades MatLab / GCE Toolbox Section

Quick overview, then talking about how to use with Coweeta
Automating Sensor Data Collection with the GCE
started at GCE in 2000 - more than 4000 downloads and used at 80 + sites in LTER and elsewhere
MATLAB is a mathworks tool : its costly, but not bad as far as commercial software goes. It does allow tinkering with the source code. It is scaleable.
Data model that will work well with long-term data.
any number of numerical and text variables, structured metadata documentation along with it.
thoughts from fox : really it's a pretty great tool, when it's managed well for a specific source and site. with good programming knowledge and maintenance, and maybe a little bit of tinkering, I feel more inspired by this tool today than I do normally. Wade has a mastery of this tool that I did not see in other experiences with it.
all of its functions must use the attributes in its metadata to work with it out of the box.
Campbell logger files, Seabird Logger files, Hobo logger files, etc. * SQL data sources, CLIM-DB, Hydro DB, etc.
Data Turbine and other streaming data middleware.
"friends don't let friends type metadata" - is smart tool to get what it can get from the data headers
gives you managed data and fully documented data sets
there is a streamlined method to use the software without having to generate all the metadata and stuff, you can actually work with only the import and data stream parts -- I wonder if what the Andrews is doing is this? We seem to have a lot of metadata we don't use?
post-processing tools, can set rules on the drive values to carry through QC information as you work with it. The data and metadata can be exported in a variety of formats, or pushed into a relational database. Push directly into the Drupal system is a future possible. CUASHI data model. HTML, KML, XML, SQL dbs, etc. Generate the web dashabord from the box without doing special stuff.
select and merge can be automated, as well as cleaning between certain days. Run any number of harvsters on a little schedule. Grow those files and go back in later and review the data
every operation is in the context of a dataset
there's a wiki site, svn repository -- I should ask if I can migrate it to github
user-support
list-serv
training opportunities / workshops
ISO standards aren't really used in MATLAB structure, because it is not so tabular? :: I didn't really follow this, but I think I missed something ::
harvest workflows
web design is coupled tightly- file is read from station to website
trimming features are built in to trim down the size of text files that can become un-useable.
you can do a lot in just the GUI
data set editor is the tool with the menu system
loads a naked campbell file and assigns nice names to it; doesn't know the units yet because they weren't in the campbell logger; comes in with basic data type information. Floating points, strings, etc. all described in there -- these help us use it in certain ways.
in the gui, you can change all the organization and structure of the attributes-- like wade showed how he changed "record" to ordinal from whatever it came in on
data table viewer in the database. the data is editted but once its destroyed or changed you can't go back. So you have to save originally an intact version.
Big edits, like filtering, will generate for you a backup - but small operations, like making a new flag, will not.
the date time- generates date component columns - there is a manual way to do all these changes.

wade says "Python is a much better match for this sort of data and toolkit" :)

^^ note the above being said for repeatability ^^

change the data select
right now there is not "flag semantics" - so you must always throw all the flags -- I, Q, V, etc. master list of flag with some priority ordering. They would maybe do this if they got more funding though :)
documentation from this presentation would be useful for future because there is a lot of documentation on the toolkit, so the wiki helps pair that down too.
QC Framework - data model has built in storage for the flags ('flags','values')
Data structure must fit the right framework before it can even go into the typed data system with all the rules. The qc is the finer scale qc
there are tools for bulk flagging and shift correction
the toolbox generally thinks in a vectorized way
rule based QC can "cause as many problems as you solve"
there are nice provisions for going back and doing manual flagging. This is one of GCE's nice features. This is good for removing flags too!
you can import and copy flags.
flags can really be manipulated very well manually. it has a ton of tools for this. Wade also has a good QC strategy -- limit rules, set based rules, etc. Multi-column dependency checks.
:: computer had to update, missed a few here :: but it is possible to interpolate missing date times
you can hide the missing values

PRECIP meeting

:: please fix if I got your name bad ::

Claire- cedar creek
Luquillo (didn't get name)
CU Boulder NIWOT Ridge LTER
NTL Sam Zipper - agricultural and urban systems
UGA person! GCE.
Dom - groundwater

groundwater and precip and trees

continuum of drought and the gradient in certain systems
oscillations and such, look at mms versus different from historical averages
historical rain vs. continuous rain vs. sample consistency over time
sample when there is a lot of rain
stream flows and run off -- stream discharge, what's available to the forest, etc. evapotranspiration.
cedar creek inputs and penetration are so different from luquillo and from andrews and water access.
satallite
luq.lternet.edu/article/2015/7/10/effects.... :: link very cool graph with greenhouse and soil o2 ::

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working Groups

Theresa -- GIS GROUP

Don -- SENSOR GROUP

John -- PASTA GROUP

Wade -- DATABITS ISSUE

POSTERS / WORKING Groups

Phil -- (leading discussion) IM going forward with new structure

Breakout Topics

Communications office

Group 4 Notes: Begin with overview, then some less organized points

Group 4's conclusions

Later afternoon meeting

Re-meet-up

Margaret's session on: Semantic web and NPP Data

contacts

Protege tool

Eda brings up links between the DEIMS and Ontologies

DRUPAL Workshop

Queue of things to do

Morning Plenaries

Ned - Theresa's friend, data visualizations. Information manager from Coweeta.

Challenges and Solutions Relating to Storage, Managing, and Delivering of High-Resolution data

methods

Judy Cushing and Susan's talk on VISUALIZATION!

Wades MatLab / GCE Toolbox Section

wade says "Python is a much better match for this sort of data and toolkit" :)

PRECIP meeting

FilesExpand file tree

notes.md

Latest commit

History

notes.md

File metadata and controls

Working Groups

Theresa -- GIS GROUP

Don -- SENSOR GROUP

John -- PASTA GROUP

Wade -- DATABITS ISSUE

POSTERS / WORKING Groups

Phil -- (leading discussion) IM going forward with new structure

Breakout Topics

Communications office

Group 4 Notes: Begin with overview, then some less organized points

Group 4's conclusions

Later afternoon meeting

Re-meet-up

Margaret's session on: Semantic web and NPP Data

contacts

Protege tool

Eda brings up links between the DEIMS and Ontologies

DRUPAL Workshop

Queue of things to do

Morning Plenaries

Ned - Theresa's friend, data visualizations. Information manager from Coweeta.

Challenges and Solutions Relating to Storage, Managing, and Delivering of High-Resolution data

methods

Judy Cushing and Susan's talk on VISUALIZATION!

Wades MatLab / GCE Toolbox Section

wade says "Python is a much better match for this sort of data and toolkit" :)

PRECIP meeting