I work in a hospital. Our doctors work on-call during the night and evening hours. There may be times when no patients are coming so they can rest. At other times, many patients will be there at once. They write down when they started and when they stopped treating a patient. With the lubridate package, I can transform these data into intervals with specific dates. The length of these intervals will vary a lot as treatment may be more or less complicated. Also when a lot is going on, a doctor may be going back and forth between patients. So a typical entry will look like this: "2016-06-11 21:45:00 UTC" "2016-06-11 22:35:00 UTC"

To see which times of day are usually very busy and which are rather slow, I would like to use these data. This should also be possible for different days of the week. The whole thing should probably look like a bar graph showing what the average occupation would be at any time of day (for example 100% occupation between 8 and 9 pm, 40% between 1 and 2 am). My problem is that I don't know how to do that. ggplot will not handle intervals and I have not found any package that will do this average or percentage for intervals.

I hope I was able to make clear what I need and what my problems are. I'm not an experienced programmer but happy to learn.

Thanks a lot



Sorry, I should have thought of that. So here is as far as I have come:

>Beginn<-parse_date_time(Daten$Beginn,"dmy HM“,tz="CET“)
>Ende<-parse_date_time(Daten$Ende,"dmy HM",tz="CET“)
##Interval with date information
##Intervals stripped of date
>Daten$Beg<-as.POSIXct(strftime(Beginn, format="%H:%M:%S"), format="%H:%M:%S")
> Daten$dur<-as.duration(Daten$Intervall)
> Daten$Interv<-as.interval(Daten$dur , Daten$Beg)
## add weekdays

This way I have time intervals pointing to the same date and I have the weekdays to sort the data by. This is where I am stuck, because I know of no way to some kind of histogram on intervals. I could just use starting dates, but that would be heavily skewed, because an interval might be between 5 mins and 2 hours.

I hope the code helps. If you need some exemplary data, just tell me.

EDIT (2): These are the raw data https://www.dropbox.com/s/tok32wzt9wjmjih/Dienstdatum.csv?dl=0

and the output from dput: https://www.dropbox.com/s/wgtw68rw9n0ksct/Output%20Dput.rtf?dl=0

I am afraid the data are not as well structured as could be, but it should still work. Not sure if it is a good idea to post the output inline, so I provided the file.

Question author Valentin | Source



So after digging around and trying different things, here is what i came up with and what works for me. The salution is somewhat convoluted and verbose bus apparently gives exact results. In trying to find the answer, I learned about the vonders of vectorisation (pardon my german accent) since producing vectorised code cut the time it took to compute the results to about 3 mins whereas before i stopped calculations after about 96 hours without having completed.

Please note that the list of documented dates (not every doctor will have completed documentation of his shift) is an excel sheet with simple dates. the list of documented times working intervals is a date and time someone started seeing a patient in one column and stopped seeing that patient in another. The next row will be similar start and stop time and date.

All the variables in the text are in german or are abbreviations of german words, but I hope my comments are sufficient to understand what's going on. Also, a lot of the code is for problems specific to my situation.

Special thanks to users PhiSeu and user3507085 who helped me with different aspects of the solution.

#read datespackage(lubridate)Daten<-read.csv2(„file.csv")#convert start dates to POSIXDaten$Beginn<-parse_date_time(Daten$Beginn,"dmy HM",tz="CET")#prevent overlap by adding one secondDaten$Beginn<-Daten$Beginn+1#convert end dates to POSIXDaten$Ende<-parse_date_time(Daten$Ende,"dmy HM",tz="CET")#remove empty rowsDaten<-na.omit(Daten)#create intervals in which people workedDaten$Intervall<-interval(Daten$Beginn,Daten$Ende)#read dates on which people workeddoku<-read.csv2(„dates.csv“,header=FALSE)doku<-parse_date_time(doku$V1,"%d.%m.%Y",tz="cet")#create a start time of 09 A.M. for shiftsdoku<-data.frame(cbind(doku,doku+32400))#add column namesnames(doku)<-c("Datum","Beginn")#convert to POSIXdoku$Datum<-as.POSIXct(doku$Datum,origin="1970-01-01",tz="cet")doku$Beginn<-as.POSIXct(doku$Beginn,origin="1970-01-01",tz="cet")#Loop to create 15 min intervals for each documented shift spanning 24 hour against which actual working hours will be checkedbegin <- as.POSIXct(doku$Beginn)# copy begin time for loopbegin_new <- begin# create duration objectaufl <- duration(15, "mins")# count times for looptimes <- 24*60/15# create dataframe with begin timeIntervall <- data.frame(begin,stringsAsFactors = FALSE)for (i in 1:times){  cat("test",i,"\n")  # save old time for interval calculation  begin_start <- begin_new  # add 15 Minutes to original time  begin_new <- begin_new + aufl  cat(begin_new,"\n")  # create an interval object between   new_dur <- interval(begin_start,begin_new)  # bind to original dataframe  Intervall <- cbind(Intervall,new_dur) }# Add column namesvec_names <- paste0("v",c(1:(times+1)))colnames(Intervall) <- vec_names#create a matrix of the number of seconds worked in each of the above 15 intervals by checking the amount of intersection between 15 intervals and documented intervals of worktest<-vector()Tabelle<-matrix(nrow=length(doku$Beginn),ncol=times)Tabelle[is.na(Tabelle)]<-0for (j in 1:length(doku$Beginn)){for (k in 1:times){test<-as.duration(intersect(Daten$Intervall,Intervall[j,k+1]))test[is.na(test)]<-0test<-sum(test)Tabelle[j,k]<-test}}#cadd start time to the above matrixAusw<-data.frame(cbind(Tabelle,begin))#convert to POSIXAusw$begin<-as.POSIXct(Ausw$begin,origin="1970-01-01",tz="cet")##analysis of data#common to all days of the week#create labels for 15 min intervalsLabels<-c("09","09:15","09:30","09:45","10","10:15","10:30","10:45","11","11:15","11:30","11:45","12","12:15","12:30","12:45","13","13:15","13:30","13:45","14","14:15","14:30","14:45","15","15:15","15:30","15:45","16","16:15","16:30","16:45","17","17:15","17:30","17:45","18","18:15","18:30","18:45","19","19:15","19:30","19:45","20","20:15","20:30","20:45","21","21:15","21:30","21:45","22","22:15","22:30","22:45","23","23:15","23:30","23:45","00","00:15","00:30","00:45","01","01:15","01:30","01:45","02","02:15","02:30","02:45","03","03:15","03:30","03:45","04","04:15","04:30","04:45","05","05:15","05:30","05:45","06","06:15","06:30","06:45","07","07:15","07:30","07:45","08","08:15","08:30","08:45")##analysis for weekends#how many percent people worked on average in any of the 15 min intervals on a saturday or sundayWochenende<-apply(Ausw[Ausw$wtag==c(1,7),1:times],MARGIN=2,FUN=sum)Prozent<-Wochenende/length(Ausw$begin[Ausw$wtag==c(1,7)]) /as.numeric(aufl)*100#add labelsnames(Prozent)<-Labels#plot as barplot and add axis labelsb=barplot(Prozent,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung am Wochenende",sub="über 100%: Übergabezeiten",xlab="Uhrzeit",ylab="Prozent")axis(1,at=c(b[seq(1,length(Labels),4)],b[length(b)]+diff(b)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))axis(2,at=seq(0,160,25),las=2)##analysos monday to fridayWoche<-apply(Ausw[Ausw$wtag==c(2,3,4,5,6),1:times],MARGIN=2,FUN=sum)Prozent2<-Woche/length(Ausw$begin[Ausw$wtag==c(2,3,4,5,6)]) /as.numeric(aufl)*100#add labelsnames(Prozent2)<-Labels#plot as barplot and add axis labelsb2=barplot(Prozent2,axes = F,axisnames=F,main="Durchschnittliche Arbeitsbelastung Montag - Freitag",,xlab="Uhrzeit",ylab="Prozent“,ylim=c(0,100))axis(1,at=c(b2[seq(1,length(Labels),4)],b2[length(b2)]+diff(b2)[1]),labels = c(Labels[seq(1,length(Labels),4)],"09"))axis(2,at=seq(0,160,25),las=2)
Answer author Valentin

Ask about this question here!