IA R UG Meeting July 12, 2021

inter-agency r-users meeting

Meeting Notes from July 12, 2021

Wendy Martinez
07-21-2021

Agenda

7/12/2021

  1. OMB meeting update-

    Wendy and others are working on proposing an overarching policy to OMB for all policies to utilize open source analytical software

    Set up a meeting with the Chief Information Officer to discuss logistics FDA is spending too much on analytical software, is attempting to propose open source software as the solution http://opengovhub.org/home

    Scott/Census- Use agencies existing statements about innovation and discuss how innovation in statistics can’t exclude open source. Also, many federal agencies list employee recruiting and retention as primary goals. Excluding open source is going to make recruiting more challenging. Opened up conversation regarding hiring recent graduates who are exclusively learning R and Python but excluding those software’s in job responsibilities, will affect recruitment and retention Would ideally also like to pay for expanded support for R Wendy, Tomas and Peter will follow up on this effort

  2. Two events coming up:

    • EPA (Anne)- Discussion on open source and possible collaboration with agencies on using open source software. Event is July 23rd

    • BLS R Users Group: July 29th, Noon, talk on time series data in R

  1. BLS Membership with R Consortium

    This is in progress Can join for free, but to gain consortium benefits the cost is ten thousand R Consortium and Python Foundation Anaconda- includes RStudio

  2. Agency updates

    • Census: The Census Bureau R Users Group is growing and expanding its userbase and presentation materials

    • EPA: Bi-annual workshop is coming up. 21st is open to all, but the 22-23 workshops is mostly limited to EPA

    • FDA: Intro to shiny workshop and other workshops

    • NAS (??): Machine learning project is in the works

Meeting notes

We had about 20 attendees – which is fantastic! Here is an agenda for our meeting today. WebEx information is below.

Update on meeting with OMB

Summary document/email was attached to the meeting invite.

Peter Meyer (BLS) can talk about Open Source Software (OSS), but has to leave early. Wendy asked for thoughts on what should be the next step after the meeting with OMB. Set up meeting with CIO of the US? Tomas (FDA) said we should have an entity to shepherd this. FDA said they are paying too much. Tomas said that could be an opening to justify it. Paul said he had gotten pushback on his advocating for R. Peter said we could get data on what – Joe Castle (?) at GSA – is doing. Peter could connect us with him. This relates to the policy published 6 years ago and was mentioned in the OMB meeting summary. There was a project called code.gov. Is there something called open.gov? Tomas will do some checking, but doesn’t know the entity exactly. It’s a private company. FU: Wendy, Tomas, and Peter. And, Scott of course.

George: New data scientists and statisticians being trained use Python and R. Hard to retain the employees if we do not give them the tools. What are the implications to HR and procurement if open source used?

Tomas reviews grants for NIH. A lot of the bio-stats and bio-informatics study sections get points off if don’t have open-source application.

Census R UGs have consolidated and have approximately 250 users. Yves made the point that OSS is not really free because some support is needed from staff. Census would rather keep the money (support) for commercial.

Upcoming events for R Users

Wendy sent invitations to get them on calendars. Feel free to forward to others.

July 23: Organized by Ann – EPA: Vince Allen is Chief Architect. He has fully embraced open-source at EPA. He will provide an overview on open-source. An opportunity to have a meeting of Chief Architects across federal agencies. So, maybe agencies would adopt and be open like EPA. Maybe organize a meeting Chief Architects. George said could we get some statements – DOT IG got approval to use R. Could we get some documents? New CIOs and the Chief Data Officers could also be the intended audience. FU: Organize this event and maybe work with CSPOS to organize workshop.

July 29 – BLS R UG @ noon. R and Time Series.

Memberships with R Consortium

Wendy will meet with Joe of R Consortium about BLS becoming a member. She will give a summary of the meeting. She said that the American Statistical Association is also becoming a member.

R Consortium and Python Foundation (Nathan)

Versioning of it is an issue. USDA and NASS – there is willingness to support R and Python. Not a lot of clarity on how to do that well. Enthusiasm at NASS about Anaconda distribution of Python because Python has updated its terms of service for a commercial entity using the Anaconda and paying a nominal fee. Gives them something to buy into. Maybe, an update schedule or maintenance of repositories. He had a thought – Anaconda distribution allows access to R packages as well. Although, it seems to be a much smaller number of packages, and might not have the ones they need (e.g., Bayesian ones). There seems to be some willingness, but there does not seem to be consensus on how to do it. FU: Check on this with R Consortium

Have other agencies had these discussions? MS R Open used?

Nathan would like to bring IT staff involved in these conversations. CSPOS wants to bring these up. Nathan will ask CSPOS Matt if they can/should look into this.

Agency updates and issues

BLS: Wendy gave an update on what the R Governing Board is up to. She knows that some (George) asked for copies of the documents, but she is waiting until they are in better shape. The GB is still working on the process and how they are going to set up the official version of R.

BLS Data Science Curriculum: Wendy mentioned the DS Curriculum, which is a semester long training that includes courses and a capstone project. FU: Wendy will ask Brandon Kopp to give a talk on it.

Ann: Preparing for EPA bi-annual R Workshop. This year will be virtual from Sep 21 – 23 in the PM. She will share registration, but might have to watch some numbers with certain sessions. Workshops: Intro to R, four advanced and intermediate classes. The plenary on 21st will likely be open to all. Plenary is Jenny Bryant of R Studio. They will also have talks from what folks are doing in R.

Census: Bureau R UG – anyone can join. Jessica Klein is new Chair of the group of about 250 members. They are hiring a lot of new data scientists.

FDA: Tomas did an Intro to Shiny workshop that went well. Several ongoing initiatives. FDA receives R programs from sponsors that document their analyses that go into submission packages (devices, etc.). They are working with R Consortium to enable a demonstration project that shows how sponsors can submit programs and proprietary code. Paul echoes concerns that others raised. Finds more and more of incoming staff have experience with R as opposed to other statistical software.

NASS: They have a small machine learning IG – teaching, bring people with shared interest. Might have comments in response fields – looking for useful information using ML approaches. They have a text analysis project in the works.

From the Chat:

from Peter M to everyone: 1:12 PM http://opengovhub.org/home

from Scott A to everyone: 1:17 PM Perhaps an avenue to be explored may be to use agencies existing statements about innovation and discuss how innovation in statistics can’t exclude open source. Also, many federal agencies list employee recruiting and retention as primary goals. Excluding open source is going to make recruiting more challenging.

from Peter M to everyone: 1:18 PM YES.

from Grace E to everyone: 1:20 PM I am currently in my Masters program for Applied Statistics, and we are only using R. In maybe 1 class we will use SAS

from Ann A to everyone: 1:20 PM R is super standard in most academic fields now, from math to social sciences. Even when academics have experience in SPSS or SAS, it’s supplemental to R or, for example: how common is it to have R users come in and have to learn SAS? are agencies ready to upskill R users to SAS? That’s an additional onboarding expense The R community is very self-policing, especially since so much of it is used for academia and public benefit

from Scott A to everyone: 1:30 PM Agree 100% with Ann

from Ann V to everyone: 1:31 PM Our security staff tests OSS for security vulnerabilities. There’s a process for obtaining freeware/shareware which involves this as part of that process.

from Andrew Y to everyone: 1:32 PM CMS/CCSQ is using Zeppelin on top of Hive

from Ann V to everyone: 1:32 PM python has more vulnerabilities than R - so they keep a little tighter control on python installations than R. We have R, Rstudio and Rtools installed via something called “big fix” - not sure if other agencies have that, but we could share the script if useful.

from Nathan C to everyone: 1:34 PM Ann, could you forward any more information on that to me?

from Ann V to everyone: 1:34 PM Nathan - the big fix script?

from Nathan C to everyone: 1:34 PM Yes, or any detail about supporting that bundle well on EPAs end.

from Ann V to everyone: 1:35 PM yes - will do

from Sunita Y to everyone: 1:42 PM Hi Nathan, I am at USDA and I still do not have R installed properly. I’ll reach out via email if you have tips on how to install R at USDA.