Please register and make submissions here. For additional information, please contact email@example.com
Submissions to Taxacom list ServeEdit
Society for Management of Electronic Biodiversity Data (SMEBD)Edit
What SMEBD is and does:
The Society for the Management of Electronic Biodiversity Data (SMEBD) is a professional society of scientists, authors of biodiversity databases that are published on the internet, through collaborative initiatives such as Euro+Med Plantbase, FADA, Fauna Europaea, ERMS and WoRMS, PESI. Membership is free and open to all contributors to these initiatives. SMEBD holds the copyright, intellectual property rights and, where appropriate, ownership of the contributing databases. It acts on behalf of its members to defend their interests, manage the databases and to provide a legal basis for their protection. It promotes the publication, dissemation and use of data and information, and development of new biodiversity data-related initiatives and projects.
What SMEBD does or demands in regard to re-use of content (i.e. licensing and attribution): SMEBD grants permission to re-use content produced by its networks of members, after consultation with the executive committee of the initiatives as appropriate, and ensures that due attribution is made by the re-user. It concludes agreements with major users in the interests of the authors of content and promotes involvement of original authors/networks as recognized partners in funded projects building on the original source.
SMEBDs opinion on rights: SMEBD promotes free access to information and is not in principle against re-use, but there are issues, and some of our member networks (e.g., WoRMS) do not allow re-use; all enquiries must go through the custodians. Arguments against free re-use of information are (1) data quality issues: ensuring that the latest version of the database is used, and preventing errors by reducing the number of transmissions; and (2) attribution and sustainability issues: not allowing re-use makes it much easier to ensure proper attribution, and allows the primary source to identify users, and report on data usage to funders: If users and funders are not clear who is creating the resource, and the database creators are not sure who their users are, it is really difficult to raise funds by any means.
We can provide ample examples of cases of internet resources or even commercial works that re-use data without due attribution.
On "... if and how we share names" (copied from the mail to Taxacom): we don't believe that "sharing" as a concept to describe re-use of data is appropriate; rather we approach the issue as "use of published information", which is well within the comfort zone of contributors.
Submitted by Hendrik Segers
Royal Botanic Gardens, Kew (RBGK)Edit
We're not sure the of the context of the question or how these contributions will be used. We expect providers of data who need to support the maintenance, development and dissemination of data to have different views from data aggregators and end users, though ultimately we probably all have the same goal: to maximise the use and impact of the data.
At the moment we face two conflicting pressures: we want to make data available to maximise its use on one hand; and we are being told by our funders (government and others) that we must seek to generate income from our intellectual property. It is possible that botanical and mycological name data will probably not attract additional income in itself, but the reputation of the institutions (and/or individuals) providing that data can be enhanced through recognition of their expertise in this area and that may lead to the development of services, consultancies or grants which do provide income. For this income potential to be realised the providers of the data need to be accredited. Broadening the range of potential funders is becoming increasingly important as Government funding is cut.
What role Kew plays
Kew represents nomenclaturalists and taxonomists on staff. We act as the compilers of nomenclatural and taxonomic information, and also as an aggregator of taxonomic information curated by our partners (e.g. that submitted to the World Checklist Programme and The Plant List). We fund the compilation effort via dedicated editorial posts, and we are active in biodiversity informatics initiatives.
Relevant resources curated at Kew
Nomenclatural resources – Kew acts as a partner in the collaborative International Plant Names Index and Index Fungorum projects.
Taxonomic resources – Kew acts as the compiler of the World Checklist of Selected Plant Families and as a partner in the Plant List.
These resources are not simply lists of names. For example, IPNI's value is in the editorial judgements on the nomenclatural status of those names. The editors annotate names to reflect their compliance with the nomenclatural code, standardise elements (such as authorship and literature citations) and create cross links between the names. So a lot of the data is created atop that which is found in the original literature. As well as the effort involved in creating the resource, the editors need skill and judgement in the interpretation of the nomenclatural code. As an organisation we need to enable them to acquire that skill, and provide them with a botanical library to research their judgements.
Costs of curating these resources
Maintenance, development and dissemination of data has costs and we need to be able to demonstrate the value and impact of such resources in order to continue to develop them and justify expenditure.
These costs can undoubtedly be shared by broadening the community of those supporting and curating the data. Our ambition would be to broadly expose the data via web services which would allow remote editing and feedback to improve the data rather than having many multiple copies of the data evolving in parallel. The challenges here are to find the resources necessary to build and maintain these services and fully accredit at the record level those who curate the data. Funding this has been difficult, and the level of accreditation does not fully recognise individuals who might give their time up to help maintain the data. We don’t think every individual needs to be accredited when large chunks of data are cited, but that information should be available. Individuals themselves should be able to cite their input into these resources. Having record level metadata would also allow users to see the source of that expert opinion. Although in theory these opinions could be categorized: eg. high quality from peer reviewed monograph, low quality occurrence with minimal taxonomic context; this could be difficult to apply consistently. However having a record level source may allow the user to be able to make some judgement.
Licencing, and its purpose
We assert copyright and database rights over the name resources Kew are involved with: either as part of a collaborative organisation or as an institution. This recognizes the investment and intellectual effort in designing and maintaining integrity withinn these resources and facilitates licensing of these resources to help us navigate the best course between open and free access and ensuring that we can maximise our ability to maintain and develop these resources. Through licensing we try to ensure:
- Data providers are accredited
- We have a better understanding of how data is being used, so that we can better plan development of the resource and services
- More effective collaborations though discussion of potential data use
- Development of joint funding proposals
- Commercial opportunities are exploited with due benefit to the data providers
- Ensure that out of date parallel copies of data are minimised.
- Respect for differing collaborators’ views with regard to access to their data (different collaborators frequently want different levels of openness; sometimes we have permission to display data, but not pass it on wholesale)
- Users are aware of the limitations in future data provision such as non-permanency of the data set or unstable identifiers (which usually are a result of limited funding).
- Tracking use of the data and feeding back these stats/knowledge to data suppliers (an e-equivalent to literature citations).
There is an administrative cost to handling licenses and we try to restrict the use of these to substantial sets of data. We don’t think we have found the ideal licence which balances the demands and desires of completely open access and the need to sustain the data and services, and we have probably made mistakes in the past.
We need to maximise access to the data without damaging our ability to maintain, develop and disseminate the data in the long term through reduced income generation potential, either directly via competition, or indirectly by reduced citation.
Comments on workshop
As of 2013-04-12, the number of submissions is very small (five including this one, of which two were comments on the workshop posted elsewhere rather than direct submissions). It would be difficult therefore to argue that the views of the community are adequately represented.The panel who will consider the submissions seems to have relatively few representatives from data providers, which may mean that this viewpoint is not sufficiently explored or understood.
Submitted by Nicky Nicolson
Scott Gardner, posted on Taxacom 7th March)Edit
Should scientific results (for instance, namesof new species) be the domain of business and business models? Business models are established to make money from something, of course, money is an abstraction that is difficult to live without. Licensing names "taxon names" or compilations of names (for a business- to make money), should probably only apply to general copyright. Individual scientists can make the decision to publish their work in journals that exist primarily behind "pay walls" or they can publish in open journals that have a more broad philosophy of how science should be made available to the masses of humanity.
Back to Names: When we are talking about names of the species on the earth, it is hard for me to understand licenses for names for things that live with us. Maybe the licensing of names idea should be only for new life forms that would not have otherwise ever have existed without human invention (like Craig Venter's synthetic life project) - the new things will need names after all. Will the prokaryote biologists propose new nomenclatural regulations for these new forms of synthetic bacterial life forms? Will the name then be licensed? Going further in to the past, while thinking of the future, I think that our way of naming species using binomial nomenclature is already providing the system that, perhaps, could be licensed based on the rights of the author (example: Linnaeus). Then, every species name "published" as valid, should be attributable to the Linnean system, and when names are made and published, as noted above, then the author would need to pay the holder of the rights to the Linnean system to get the name into the system for use. Hmmmm, this seems like the payment system for IP addresses and domain names.... We will soon need to be logging in to "GoLinnaeus.com" to pay our life time payment for license fee for the name of a new species. Then are the rights transferrable to an offspring or accumulator, and can the rights then be sold to the highest bidder. I guess that I am setting up my business model now - or is this how ZooBank should be funded.
I still think that species names should be open, not licensed, not for profit, and for scholarly communications, there should be free access without pay walls. This is to enable scientists in places like Mongolia, or Zambia, or Bolivia, or Hati to gain access to scientific articles. We here in the lands of plenty can generally pay for what we want or need, or get to a library that has a way to get a request for free, but this is not true for vast swaths of humanity in South America, Central America, Africa, and Asia.
Now: into the arena of biodiversity and understanding species, for example, we will never be able make a push to save habitats for Forest Elephants if the data that show they are being extirpated rapidly is held behind pay walls. Here is a partial example:
Access Online Article Doom of the elephant-dependent trees in a Congo tropical forest Original Research Article
Forest Ecology and Management, Volume 295, 1 May 2013, Pages 109-117 David Beaune, Barbara Fruth, Loïc Bollache, Gottfried Hohmann, François Bretagnolle View Abstract If you have a Username & Password, you may already have access to this article. Please login below. Username: Password: Remember me | Cancel Athens/Institution login Forgotten your Username or Password? Remote access activation If you do not have a Username and Password, click the "Register to Purchase" button below to purchase this article.
Price: US $ 31.50 Register to Purchase
Wow - I bet that lots of people in Africa can access this right now for only $31.00
I am a systematist/taxonomist who specializes on the insect superorder Neuropterida (orders Neuroptera, Megaloptera and Raphidioptera). Over the past 25 years I have developed extensive datasets on these insects, including global taxonomic and nomenclatural datasets. The Neuropterida contains ca. 7300 valid species (including both extant and fossil taxa), ca. 10,100 available species-group names, and ca. 21,000 genus-[subgenus]-species-[subspecies]-[infrasubspecies] "combinations". I have personally verified relevant original-publication data for ca. 99.9% of all available genus- and species-group names in the Neuropterida, so these datasets are of high quality from that standpoint. I work as a professor at a major research university in the United States, which influences my views on data sharing and attribution/crediting.
My General Outlook:
I generally support the free sharing of nomenclatural data under a typical academic attribution model. I share my taxonomic/nomenclatural data sets through a research website (Lacewing Digital Library [LDL]). I have also shared these datasets with the Catalogue of Life project, and am working to provide the data also directly to GBIF. Personally, I'm not too concerned about receiving attribution/credit from occasional users of small numbers of names, though that is nice, and could increase the visibility of resources like the LDL. But, receiving attribution/credit from users who either make extensive use of selected nomenclatural data, or who benefit from scooping up large amounts of data, seems only appropriate. I wouldn't want to assign intellectual property right to the use of particular scientific names, and assigning such rights to simple compilations of names seems fraught with peril in terms of information sharing. However, as the value added to simple lists of names increases -- for instance, by the inclusion of such data as, publication details and source, nomenclatural status, taxonomic status, synonymy, geographic/chronostratigraphic/lithostratigraphic distributions, links to literature, primary type data, ... -- the intellectual content and investment by name compilers increases substantially, and the expectation of proper attribution/credit also increases.
From the standpoint of an academic, a major difficulty in justifying the time and effort spent to initially develop (and, subsequently maintain) a digital nomenclatural dataset, and to share that dataset, is how to turn that effort into one or more "peer-reviewed publication-equivalents". In my view, biological dataset aggregators like CoL, GBIF, and others have largely failed to address this issue. In their rush to pull in as much data as quickly as possible, some datasets of questionable quality have been incorporated, and it is generally not possible to rigorously evaluate the relative quality of different datasets along any of a variety of axes that might be used to assess their quality. The inability to evaluate dataset quality elevates the perceived value of poor datasets, and depresses the perceived value of high-quality datasets.
Some Things That I Would Like to See Accomplished:
(1) Establish more rigorous standards for assessing the "quality" of nomenclatural datasets. These standards need to go beyond specifying required and optional fields. They need to more explicitly address the precise forms of data expected in those fields, and the inclusion of metadata for documenting how well the delivered data match the target standard. Data delivered by aggregators need not initially be at the highest quality level to be functional, but it should be possible for users to differentiate levels of quality among different datasets (perhaps using multiple criteria), and data providers should be clear about the quality scale(s) used to differentiate among datasets, so that they can work toward providing the highest quality data.
(2) Establish a formal review system for datasets, and establish a set of criteria for preforming such reviews. Data aggregators should do this to understand and insure the quality of the data that they are aggregating and to establish a "peer-reviewed publication equivalent" for data contributors. The review system and criteria should explicitly address, and perhaps differentiate among, the initial submission of a dataset, and the resubmission of updated datasets. If data aggregators expect to receive long-term continuing support from academic data providers, they must be prepared to provide some form of "sequential review" for updated datasets, so that such updates can be considered by their providers/authors as separate "publications", thus allowing the incremental effort to be reported as such in the context of reviews of academic performance. As data aggregators continue to fill in the major gaps in coverage across all taxonomic groups with initial data sets, the maintenance and updating of those datasets by qualified data providers will become a major data quality issue moving into the future. Major data aggregators can help academic data providers by providing systems that result in "peer-reviewed publication equivalents", which are units that are meaningful to academic data providers in the context of reporting academic performance.
(3) In all data dissemination products of data aggregators, clearly establish, differentiate, and provide for the attribution/credit due to (1) data aggregators, for the value added of aggregation and novel dissemination mechanisms, and (2) data providers, for their data compilation and maintenance efforts.
He didn't actually submit it, but he did write about it..... http://iphylo.blogspot.ca/2013/03/on-names-attribution-rights-and.html
iPhylo Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics.
On Names Attribution, Rights, and Licensing of taxonomic names Few things have annoyed me as much as the following post on TAXACOM:
The Global Names project will host a workshop to explore options and to make recommendations as to issues that relate to Attribution, Rights and Licensing of names and compilations of names. The aim of the workshop is a report that clarifies if and how we share names.
We seek submissions from all interested parties - nomenclaturalists, taxonomists, aggregators, and users of names. Let us know what (you think) intellectual property rights apply or what rights should be associated with names and compilations of names. How can those who compile names get useful attribution for names, and what responsibilities do they have to ensure that information is authoritative. If there are rights, what kind of licensing is appropriate.
Contributions can be submitted http://names-attribution-rights-and-licensing.wikia.com/wiki/Main_Page, where you will find more information about this event. I'm trying to work out why this seemingly innocuous post made me so mad. I think this is because I think this fundamentally framing the question the wrong way. Surely the goal is to have a list of names that is global in scope, well documented, and freely usable by all without restriction? Surely we want open and free access to fundamental biodiversity data? In which case, can we please stop having meetings and get on with making this so?
If you frame the discussion as one of "Attribution, Rights and Licensing of names and compilations of names" then you've already lost sight of the prize. You've focussed on the presumed "rights" of name compilers instead.
I would argue that names compilations are somewhat overvalued. They are basically lists of names, sometimes (all to rarely) with some degree of provenance (e.g., a citation to the original use of the name). As I've documented before (e.g., More fictional taxa and the myth of the expert taxonomic database and Fictional taxa) entirely fictional can end up in taxonomic databases with alarming ease. So any claims that these are expert-curated lists should be taken with a pinch of salt.
Furthermore, it is increasingly easy to automate building these lists, given that we have tools for finding names in text, and an ever expanding volume of digitised text becoming available. Indeed, in an ideal world where all taxonomic literature was digitised much of the rationale for taxonomic name databases would disappear (in the same way that library card catalogues are irrelevant in the age of Google). We are fast approaching the point where we can do better than experts. To give just one example, in a recent BHL interview with Gary Poore it was stated that:
For example, the name widely used name Pentastomida itself was widely attributed to Diesing, 1836, but the word did not appear in the literature until 1905.
A quick check of Google Ngrams shows this to be simply false:
I don't need taxonomic expertise to see this, I simply need decent text indexing. So, if you have a list of names, you have something that it will soon be largely possible to recreate using automated methods (i.e., text mining). With a little sophistication we could mine the literature for further details, such as synonymy, etc. Annotation and clarification of a few "edge cases" where things get tricky will always be needed, but if you want to argue that your lists deserves "Attribution, Rights and Licensing" then you fail to realise that your list is going to be increasing easy to recreate simply by crawling the web.
It seems to me that most taxonomic databases are little more than digitised 5x3 index cards, and lack any details on the provenance of the names they contain. They often don't have links to the primary literature, and if they do cite that literature they typically do so in a way that makes it hard to find the actual publication. I once gave a talk which included the slide below showing taxonomic databases as being "in the way" between taxonomists and users of taxonomic information:
In the old days building taxonomic databases required expertise and access to obscure, hard to find, physical literature. A catalogue of names was a way to summarise that information (since we couldn't share access). Now we are in an age where more and more primary taxonomic information is available to all, which removes most of the rationale for taxonomic databases. Users can go directly to taxonomic information themselves, which mean they can get the "good stuff", and maybe even cite it (giving us provenance and credit, which I regard as basically the same thing). In many ways taxonomic databases are transitional phenomena (like phone directories, remember those), and one could argue are now in the way of the taxonomists' Holy Grail, getting their work cited.
Lastly, any discussion of "Attribution, Rights and Licensing of names and compilations of names" reflects one of the great self inflicted wounds of biodiversity informatics, namely the reluctance to freely share data. As we speak terrabytes of genomics data are whizzing around the planet, people are downloading entire copies of GenBank and creating new databases. All of this without people fussing over "Attribution, Rights and Licensing." It's time for taxonomic databases to get over themselves and focus on making biodiversity data as accessible and available as genomics data.
Noel Heim, PaleobiologistEdit
I am an academic researcher who primarily uses large compilations of taxonomic names, typically from the Paleobiology Database, to investigate biodiversity over the Phanerozoic. I am not a taxonomist, but I have done taxonomic work in the past and understand the magnitudes of effort and expertise required to name and classify organisms. In fact, there is very little research that could be done on biodiversity without the work of taxonomists and systematists.
Intellectual property rights for taxonomic names themselves should not be granted to or held by the taxonomists that create them. Taxonomic names have in practice been created, distributed and used within the public domain since the time of Linnaeus. Attribution is codified in the ICZN and ICN, so applying a CC BY license to taxonomic names redundant and anything more restrictive would restrict the science of taxonomy itself. A CC0 for taxonomic names will allow the important work of taxonomists to continue unencumbered.
Licensing data sets of aggregated names from sources such as the Paleobiology Database or the Global Names Project is potentially more complicated. Even here though, I think a CC0 license is most appropriate, although I can certainly see the argument for a CC BY license. Aggregating data and developing the digital infrastructure for its distribution is time consuming. Given that the vast majority of taxonomic databases not compiled by governments are compiled by academics, the need for proper attribution is paramount. However, the norms of scientific publication dictate that users of aggregated data cite their sources. Exactly how to cite a data set extracted from the Paleobiology Database, for example, is at the moment unclear, but the lack of clarity is not related to licensing. If I used a Paleobiology Database dataset in a research article I would cite it exactly the same if it were released under a CC0 or a CC BY license. Scientific norms require citations and as a community we need to agree upon citations standards that give credit to both data aggregators and the taxonomists who published the original works. Simply applying a CC BY license to data sets will not solve the problem of how to give credit where it’s due.
Applying non-commercial (NC) licenses, I think, is unnecessarily restrictive. The main argument I’ve heard from those involved with database projects who advocate NC licenses is that they volunteer their time by contributing to these projects and would feel cheated if someone else made a profit off their efforts. I would argue that academics who contribute to data aggregation projects are not volunteers. As academics, we all draw a full time salary and in return we are expected to produce research results and contribute to the intellectual growth of our chosen fields. It is easy, but incorrect, to confuse the luxury of having a choice of where to focus our professional efforts with volunteering. It is usually true that we are not directly paid by database projects, but our contributions benefit our careers and ensure that the institutions we work for will continue to pay us. If the more than 200 scientists who contribute to the Paleobiology Database were to cease their contributions they would then have to find some other project to focus their time and effort in order to continue their research and service. Again, attribution is key, but the complexities of giving attribution to data generators and aggregators is not solved by applying restrictive licenses. It is solved by reaching a consensus as a community on how to cite aggregated data.
No derivatives (ND) licenses are indefensible, and I suspect that when such licenses have been applied to scientific databases it has been done because of a misunderstanding of what the licenses mean. An ND license renders a dataset useless. A bar chart showing the number of species within families of insects, for example, derived from a list of species extracted from a taxonomic database is a derivative and if published or distributed would violate and ND license. An ND license means that the dataset can not be used.