Clamav sidecar proof of concept #81

rahearn · 2020-12-04T20:58:33Z

Description of change

This installs clamav in our app container, and ensures that virus file definitions are kept up to date. It does not include running the clamd daemon in the background, so scans only happen by manually calling clamscan with a file or directory.

Pros to this approach:

not much for us to maintain going forward, does not use a custom buildpack
will satisfy controls around automated updating of virus definition files, no matter how often we deploy or restart the app
could potentially be expanded to cover scanning app files, if we get pushback that we need to do that as well

Downsides:

need to figure out how we're going to call clamscan on uploaded files
uses a ton of memory compared to the app by itself. 1.5G of RAM was the bare minimum to not crash out of scanning the src directory. 1G was the minimum to just get the freshclam process to complete

Other notes:
This requires version 7 of the cf-cli tool. That's easily installed on macs (brew install cf-cli@7), but I didn't look into other operating systems.

How to test

This app is deployed as tta-smarthub-ryan to the tta-transient-ryan space. Run

cf target -s tta-transient-ryan
cf ssh tta-smarthub-ryan
/tmp/lifecycle/shell
clamscan -d $CLAMAV_DATA_DIR src will run a scan of the src directory

vcap@15bdae76-905e-49d1-782a-142e:~$ clamscan -d $CLAMAV_DATA_DIR src/

----------- SCAN SUMMARY -----------
Known viruses: 8946438
Engine version: 0.102.4
Scanned directories: 1
Scanned files: 0
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 26.048 sec (0 m 26 s)

Issue(s)

As a user, I want uploaded files to be scanned for malicious code, so that I don't have to call the IT department (RA-05, SI-03) HHS/Head-Start-TTADP#123

Checklist
Not ready to merge, so deleting the checklist.

Add initial New Relic integration

CI: Accessibility scan

rahearn · 2020-12-07T15:59:32Z

Got some feedback from other 18f engineers on this as well. They suggested the possibility of setting up another cloud.gov app that exposes a clamav API.

https://blog.theodo.com/2017/11/implement-antivirus-api-10-min/ describes the general approach, and there are reports of similar setups being successfully implemented in cloud.gov

Pros to this approach:

memory overhead limited to a single clamav app instance, rather than having to absorb it on every instance
calling out to scan a file is a REST call rather then shelling out to the system, thereby avoiding a potential vulnerability point of access

Cons to this approach:

Will not allow for scanning the app container (probably not something we need to do anyway)

kryswisnaskas · 2020-12-08T15:28:36Z

Nice. And it looks like clamav is one of the top free solutions for virus scanning.

I haven't tested it yet, but had a couple of questions:

----------- SCAN SUMMARY -----------
Known viruses: 8946438
Engine version: 0.102.4
Scanned directories: 1
Scanned files: 0
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 26.048 sec (0 m 26 s)

How should we read this scan summary? Did it actually scan any data? (0 MB, 0 files).

Do we have any idea how big the uploaded files will be?
I am a bit concerned about the time it took (that could be due to not understanding what it actually scanned).

I took a look at the architecture behind setting up another cloud.gov app exposing the clamav API. It definitely is an interesting approach. I think the advantage in that scenario would be that we could call the service via a REST api from our backend. While there are ways to call a shell command from nodejs, using a REST api seems like a better interface.
We would need to figure out how long it would take to scan one / several files though, both by calling a shell command vs. posting to the REST api.

The downside in addition to the "cons" listed above, could be introducing more complex setup, potentially bringing in more points of failure.

kryswisnaskas · 2020-12-08T15:32:39Z

memory overhead limited to a single clamav app instance, rather than having to absorb it on every instance

This would be an important advantage

rahearn · 2020-12-08T16:54:43Z

Did it actually scan any data? (0 MB, 0 files)

Oops, no, needed a -r flag.

vcap@a1165ea4-15df-418f-4ff4-31f9:~$ clamscan -d $CLAMAV_DATA_DIR -r build

----------- SCAN SUMMARY -----------
Known viruses: 8947852
Engine version: 0.102.4
Scanned directories: 15
Scanned files: 151
Infected files: 0
Data scanned: 10.37 MB
Data read: 4.63 MB (ratio 2.24:1)
Time: 22.072 sec (0 m 22 s)

I am a bit concerned about the time it took

Given the difference between this scan and the original one, looks like the vast majority of the time is startup overhead. That would be mitigated by running clamd in the background and clamdscan instead of clamscan.

I'm definitely leaning towards the separate app/REST api for our use rather than running clamd as a sidecar.

rahearn · 2020-12-10T19:11:08Z

Decision made: we will run a separate ClamAV app w/ REST API. Work tracked here: HHS#203

rahearn added 4 commits November 30, 2020 14:24

Merge pull request #182 from adhocteam/main

742896c

Add initial New Relic integration

Merge pull request #184 from adhocteam/main

4c86b5f

CI: Accessibility scan

Ignore the transient terraform directory

7d4dc45

Add clam and update process as sidecar

2a4f325

rahearn requested a review from SarahJaine December 4, 2020 20:59

rahearn marked this pull request as draft December 4, 2020 21:00

Remove log config in favor of logging to stdout

1536183

rahearn requested review from jasalisbury and kryswisnaskas December 4, 2020 21:33

mogul mentioned this pull request Dec 7, 2020

[research 3d]: Malware-detection sidecar buildpack for cloud.gov apps GSA/data.gov#1716

Open

2 tasks

rahearn mentioned this pull request Dec 10, 2020

As a system operator, I want a clamav app running, so I can scan files for malware on demand (RA-5) HHS/Head-Start-TTADP#203

Closed

rahearn closed this Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clamav sidecar proof of concept #81

Clamav sidecar proof of concept #81

rahearn commented Dec 4, 2020 •

edited

Loading

rahearn commented Dec 7, 2020 •

edited

Loading

kryswisnaskas commented Dec 8, 2020 •

edited

Loading

kryswisnaskas commented Dec 8, 2020 •

edited

Loading

rahearn commented Dec 8, 2020

rahearn commented Dec 10, 2020

Clamav sidecar proof of concept #81

Clamav sidecar proof of concept #81

Conversation

rahearn commented Dec 4, 2020 • edited Loading

rahearn commented Dec 7, 2020 • edited Loading

kryswisnaskas commented Dec 8, 2020 • edited Loading

kryswisnaskas commented Dec 8, 2020 • edited Loading

rahearn commented Dec 8, 2020

rahearn commented Dec 10, 2020

rahearn commented Dec 4, 2020 •

edited

Loading

rahearn commented Dec 7, 2020 •

edited

Loading

kryswisnaskas commented Dec 8, 2020 •

edited

Loading

kryswisnaskas commented Dec 8, 2020 •

edited

Loading