ISF Logo   IS Forum
Forum Index Register Members List Events Mark Forums Read Help

Go Back   International Skeptics Forum » General Topics » Computers and the Internet
 


Welcome to the International Skeptics Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today.
Reply
Old 16th May 2020, 07:36 AM   #1
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Covid Database

I'm creating a database for capturing data surrounding covid-19. There are many organizations that have implemented such databases, but there are some features I'd like that I don't see elsewhere. (Also this is a cabin fever skills refresher, so I'm glad to invest the time.)

Features I intend to include:

* able to store data at Country, State, and/or Subdivision (e.g. County) granularity
* use standardized codes for geographic keys
* implement numeric facts (e.g. Cases, Deaths) without schema change
* support multiple versions of the same numeric fact

I'm doing the project in MS SQL Server but I'm avoiding features specific to the product in case I want to port it to a different DB platform.

I'll be posting here as I progress and I welcome feedback.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th May 2020, 01:19 PM   #2
portlandatheist
Illuminator
 
portlandatheist's Avatar
 
Join Date: Jun 2007
Posts: 3,297
Very cool. Is this going to be just a personal project or is there going to be a publicly available interface of some kind to the database?
portlandatheist is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th May 2020, 01:37 PM   #3
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
I definitely would like to make it public. Minimally I'll create pivots in google sheets or excel and make those public.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th May 2020, 01:50 PM   #4
Skeptic Ginger
Nasty Woman
 
Skeptic Ginger's Avatar
 
Join Date: Feb 2005
Posts: 80,952
Probably not what you have in mind for the thread, but maybe these kind of links could be of use to people reading the thread.

CDC MMWR: Characteristics of Health Care Personnel with COVID-19 — United States, February 12–April 9, 2020

Smart Air: The Ultimate Guide to Homemade Face Masks for Coronavirus.
It's an excellent article evaluating face mask materials, thorough study of all sorts of materials.

COVID-19 Superspreader Events in 28 Countries: Critical Patterns and Lessons

Coronavirus Treatment Could Lie in Existing Drugs has a list of current drugs under study and the status of the research. But it is from back in Feb so I'll look for a more recent source.
__________________
Privatize the profits and socialize the losses. It's the American way. That's how Mnuchin got rich. Worse, he did it on the backs of elderly people who had been conned into reverse mortgages. Mnuchin paid zero, took on the debt then taxpayers bailed him out.

Space Force.
Because feeding poor people is socialism.

Last edited by Skeptic Ginger; 16th May 2020 at 01:51 PM.
Skeptic Ginger is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th May 2020, 04:55 AM   #5
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 25,713
Where will you get the data from? Do you need data sources for specific countries?
__________________
This signature is for rent.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 18th May 2020, 08:05 AM   #6
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Originally Posted by rjh01 View Post
Where will you get the data from? Do you need data sources for specific countries?
To start with, the NY Times database at County level of detail, US only.

For global data, there are a couple of options I'm considering. John Hopkins database is the leading candidate.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 18th May 2020, 10:52 AM   #7
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Some design features I'm not implementing which experience tells me could bite me in the ass down the road:
  • No Date table. Implications: no integer keys for dates (performance, who cares with this relatively small data set), no easy prevention of date gaps on charts, custom date rollups become complicated.
  • No staging area -- everything in one database.
  • No incremental updates. Data refresh will require full reload.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 12:23 AM   #8
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 25,713
Data sources
For Australia you can use https://www.health.gov.au/news/healt...current-status
The above is primary data.

For the rest of the world you can use https://www.worldometers.info/coronavirus/#countries

I keep a record of the number of cases in a spreadsheet. This includes a couple of graphs you can use. I can give you that if you wish. Just send me a PM. I use the above sources for my spreadsheet. Please note all sources will give you different information. One of the issues is when is midnight?

I assume you have seen this https://coronavirus.jhu.edu/map.html as you have mentioned John Hopkins. Please note one of his sources is worldmeters so despite the fact that he is the go-to site it is a tertiary source of information.

You should also ask where does everyone get their information from? Then try to get it from primary sources. Much better than other sources. Otherwise what value are you adding to the data?

Edit. I see the John Hopkins give you sources, including the one I mentioned above. Maybe you should use their sources and do it better than John Hopkins. List your assumptions and methods.
__________________
This signature is for rent.

Last edited by rjh01; 19th May 2020 at 12:27 AM.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 06:47 AM   #9
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Many thanks!

Of course I'm paying attention to data sourcing. And yes, I did notice that John Hopkins sources from worldometer (among many others). Coincidentally, they took some flack in the news recently for sourcing from worldometer (which I dont care about). And yes indeed, if I had time I would try to eliminate intermediary sources.

If there is value to this project -- a big if -- it won't be due to better data sourcing/quality than these various data teams that have far more resources than me. It will be due to a superior, flexible schema that allows new numeric facts without schema change.

Ideally (and optimistically) I will prove the value, then hand it off to someone else for care and feeding.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 07:15 AM   #10
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
The devil is in the devilish details. Per usual.

John Hopkins treats the cruise ships (and military?) as separate geographic entities. This won't integrate with my nicely arranged/keyed geography tables. I'll add a country named Other and these weird entities will be States inside of Other.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 07:25 AM   #11
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
So far as I can tell, worldometer doesn't provide a downloadable file with time series data, which is an essential requirement.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 09:13 AM   #12
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Ugh. The John Hopkins downloadable data is structured in a way to make it highly impractical to import. Dates are spread horizontally...

Geography May1 May2 May3 ... etc

as opposed to...

Geography Date

I need tall and skinny, not short and and fat (and fatter by the day).

With a variable number of columns, in order to normalize this data I'll need to write procedural code (which I avoid like the plague, wisely). I can do it in a stored procedure using a cursor, but that will likely be SQL Server specific. Ugh again.

Alternately, they have a separate file for each day. Not useful either.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump

Last edited by varwoche; 19th May 2020 at 09:14 AM.
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 10:52 AM   #13
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Our World In Data to the rescue. Clean and ready to import.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 04:35 PM   #14
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 25,713
Originally Posted by varwoche View Post
Ugh. The John Hopkins downloadable data is structured in a way to make it highly impractical to import. Dates are spread horizontally...

Geography May1 May2 May3 ... etc

as opposed to...

Geography Date

I need tall and skinny, not short and and fat (and fatter by the day).

With a variable number of columns, in order to normalize this data I'll need to write procedural code (which I avoid like the plague, wisely). I can do it in a stored procedure using a cursor, but that will likely be SQL Server specific. Ugh again.

Alternately, they have a separate file for each day. Not useful either.
If you were using Microsoft Excel the procedure for that is

1. Copy the data into the clipboard.
2. Open a new spreadsheet
3. Right click and click on paste special ...
4. Tick transpose box (bottom right corner)
5. Click ok.

Result: Everything is rotated 90 degrees.
__________________
This signature is for rent.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 19th May 2020, 05:09 PM   #15
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Originally Posted by rjh01 View Post

If you were using Microsoft Excel the procedure for that is

1. Copy the data into the clipboard.
2. Open a new spreadsheet
3. Right click and click on paste special ...
4. Tick transpose box (bottom right corner)
5. Click ok.

Result: Everything is rotated 90 degrees.
Yes thanks, I appreciate your input!

Alas, in addition to my Office subscription having expired, it's not practical to automate for a variety of reasons.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 23rd May 2020, 03:25 PM   #16
xterra
So far, so good...
 
xterra's Avatar
 
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 3,548
Varwoche, try LibreOffice as a replacement for MS Office.

There is a slight learning curve but in my (very unsophisticated) spreadsheet use I haven't found any functions in MO that are not implemented in LO.

As the name implies it is free.
__________________
Over we go....

Last edited by xterra; 23rd May 2020 at 03:28 PM.
xterra is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old Yesterday, 08:11 AM   #17
varwoche
Penultimate Amazing
 
varwoche's Avatar
 
Join Date: Feb 2004
Location: Puget Sound
Posts: 14,273
Originally Posted by xterra View Post
Varwoche, try LibreOffice as a replacement for MS Office.

There is a slight learning curve but in my (very unsophisticated) spreadsheet use I haven't found any functions in MO that are not implemented in LO.

As the name implies it is free.
I'm using google sheets currently, which I reluctantly pay for. I'll definitely check out libre.

My main needs are (1) pivot tables and (2) shareability of same.
__________________
To survive election season on a skeptics forum, one must understand Hymie-the-Robot.
My authority is total - Trump
varwoche is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old Yesterday, 08:26 AM   #18
Trebuchet
Penultimate Amazing
 
Trebuchet's Avatar
 
Join Date: Nov 2003
Location: Port Townsend, Washington
Posts: 26,981
I've been tracking new deaths and cases in a spreadsheet (Apache Open Office, although I'm given to understand Libre is better) and doing a 10-day running average to smooth out the weekend reporting anomaly. I'm wondering if 10 days is the best sort of period for that however.
__________________
Cum catapultae proscribeantur tum soli proscripti catapultas habeant.
Trebuchet is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old Today, 12:08 AM   #19
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 25,713
Originally Posted by Trebuchet View Post
I've been tracking new deaths and cases in a spreadsheet (Apache Open Office, although I'm given to understand Libre is better) and doing a 10-day running average to smooth out the weekend reporting anomaly. I'm wondering if 10 days is the best sort of period for that however.
I would suggest 7 days would be better. Or 14 if you do not mind sluggish changes.
__________________
This signature is for rent.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old Today, 08:36 AM   #20
Trebuchet
Penultimate Amazing
 
Trebuchet's Avatar
 
Join Date: Nov 2003
Location: Port Townsend, Washington
Posts: 26,981
Originally Posted by rjh01 View Post
I would suggest 7 days would be better. Or 14 if you do not mind sluggish changes.
Thanks, that's what I was wondering. I may put 7 day averages in additional columns and see how that changes things.
__________________
Cum catapultae proscribeantur tum soli proscripti catapultas habeant.
Trebuchet is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Reply

International Skeptics Forum » General Topics » Computers and the Internet

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -7. The time now is 04:07 PM.
Powered by vBulletin. Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum began as part of the James Randi Education Foundation (JREF). However, the forum now exists as
an independent entity with no affiliation with or endorsement by the JREF, including the section in reference to "JREF" topics.

Disclaimer: Messages posted in the Forum are solely the opinion of their authors.