PDF to Text Conversion

This project combines a couple of well trodden paths: PDF to text conversion, and then running an app in the background with audio playback. It introduced some new concepts to me and, based on a trawl of the usual resources for problem-solving, at least a couple of issues that are worth recording.

The TL;DR version is that PDF parsing gets into pretty complicated territory – unless you happen to know C well. There are open source libraries out there, but they didn’t hit the mark for me. I’ve implemented my own parser which is crude, but works. More or less!

Any PDF manipulation in iOS is going to depend on the Quartz 2D library somewhere along the line. Whether you call it directly or rely on another API that wraps it is a matter of choice. I looked at a couple. PDFKitten has a lot of functionality and seems to be by far the most sophisticated open source library but the documentation didn’t cover the simple requirement that I had – text extraction. There’s another one called pdfiphone that I struggled to get to work, and which epitomises the main challenge that I had with this project: I have only a rudimentary knowledge of C, which is what you’re getting into with Quartz.

So the basic structure of a PDF is a series of tags associated with different types of payload. You break the document down into pages, and process each page as a stream, calling C functions associated with the tags. This is a simple adaptation of example code straight from the Quartz documentation:

for (NSInteger thisPageNum = 0; thisPageNum < numOfPages; thisPageNum++)
{
   CGPDFPageRef currentPage = CGPDFDocumentGetPage(*inputPDF, thisPageNum +1);
   CGPDFContentStreamRef myContentStream = CGPDFContentStreamCreateWithPage (currentPage);
   CGPDFScannerRef myScanner = CGPDFScannerCreate (myContentStream, myTable, NULL);
   CGPDFScannerScan (myScanner);
   CGPDFPageRelease (currentPage);
   CGPDFScannerRelease (myScanner);
   CGPDFContentStreamRelease (myContentStream);
   CGPDFOperatorTableSetCallback(myTable, "TJ", getString);
}

In the last line I call my own C function ‘getString’ when the stream encounters the tag “TJ”. Here’s the first part that was new to me: the blending of C and Objective C. My function call, which is an adaptation of code I found here, looks like this:

void getString(CGPDFScannerRef inScanner, void *userInfo)
{
   CGPDFArrayRef array;
   bool success = CGPDFScannerPopArray(inScanner, &array);
   for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 1)
   {
      if(n >= CGPDFArrayGetCount(array))
      continue;
      CGPDFStringRef string;
      success = CGPDFArrayGetString(array, n, &string);
      if(success)
      {
         NSString *data = (__bridge NSString *)CGPDFStringCopyTextString(string);
         [globalSelf appendMe:data];
      }
   }
}

So there a couple of things going on here: this code is simply looking for a string – well a Quartz style CGPDFStringRef – in the payload passed in by stream process. If it finds one, it converts it into an NSString via some ‘bridge casting’ – something I’ve come across before in working with the keychain, and which you need for ARC compliance. I then take that string and append it to a property in a local method called appendMe.

It’s not possible to call ‘self’ from a C function. There are a number of possible ways around this, some of which get pretty nasty. The most elegant that I found was this:

static P2VConverter *globalSelf;

-(void)setMyselfAsGlobalVar
{
   globalSelf = self;
}

…which assigns an instance of the class that I created to do the PDF processing to a static variable called *globalSelf, and which I can then refer to as an alternate to self. To say this implementation isn’t particularly memory efficient is an understatement – but it works.

There is a rich set of tags defined by a published PDF standard – all 800 pages of it – that tell whatever is rendering the document what to do with it. The best general explanation I found was this. There is a relatively small set of text related tags and TJ seems to be the simplest. It’s also the only one that I was able to adapt from other examples. I may come back to this again.

The way I tested this was to convert an HTML page into a PDF using Safari. The more complicated the text structure in your input document – say multiple columns per page, text boxes etc – the worse this simple extraction mechanism is going to cope.

On to email based importing of files. This isn’t something that I’d ever looked at before and it turned out to be a little more complicated than I expected. The amendments to the info.plist are pretty trivial, creating the association between the file type and the app. So in the PDF reader, when you launch the app in the contextual menu, what actually happens is that the file is copied to the app’s Documents folder, and a file:// style URL which points at it is passed to the AppDelegate – specifically, the application:(UIApplication *)application handleOpenURL method. I’d assumed in the first instance, that I’d import the header for the viewcontroller into the AppDelegate – and this is a single view app, so ViewController.h – instantiate the VC, call a method I expose in the header and I’d be done:

ViewController *thisVC = [[ViewController alloc] init];
[thisVC importFromEmail:url];

This is wrong, and led to some peculiar side effects, which emerged when I started to try to set the point in the text to resume speech to reflect a change in the scrubbing control. This is what I came up with, which is a variant of this, adapted for a single view app:

UIStoryboard *storyboard = [UIStoryboard storyboardWithName:@"Main" bundle:nil];
UINavigationController *root = [[UINavigationController alloc]initWithRootViewController:[storyboard instantiateViewControllerWithIdentifier:@"EntryVC"]];
self.window.rootViewController= root;
ViewController *thisVC = (ViewController*)[[root viewControllers] objectAtIndex:0];
if (url != nil && [url isFileURL]) 
{
   [thisVC importFromEmail:url];
   NSLog(@"url: %@", url);
}

A couple of points of note. First, the method being called here is going to run before  viewDidLoad or viewWillAppear. I normally do various inits in viewWillAppear, so I put them in a method that I call immediately in importFromEmail. Second, the string value for instantiateViewControllerWithIdentifier needs to be set in the storyboard.

Apart from the nasty callouts to C, what I spent most of my time working on scrubbing control functionality. I’ve created an IBAction method that will be called when the scrubber moves position. In the storyboard, I set the maximum value of the control to be 1, so to get the index of the new word position after dragging the control, I multiply the fraction that moving the control allows me to reference in the IBAction by the length of the original pdf string length.

Having stopped – not paused; more on that in a second – the playback, I then start a new playback in the IBAction method for the play button, having created a new string based on a range: the word index from the scrubber control as the start point, and then the length by subtracting that from the original pdf string length. There was a little bit of twiddling necessary to support this so that it would work when called multiple times.

Part of the reason I took this approach was because the continueSpeaking method on the AVSpeechSynthesizer class didn’t seem to work. This was because I was using stopSpeakingAtBoundary instead of pauseSpeakingAtBoundary – something I’ve just noticed. Doh!

This has a knock-on effect, which is that the play button has to be stateful, with a continue if the pause was pressed, or a restart with the new substring if it’s because of the setting of the scrubber control. Given that the actual quality of the string conversion is pretty basic, the fix for this exceeds the usefulness of the app.

A couple of final comments. I discovered that it’s best to keep the functionality in the IBAction method called for the scrubbing control changes pretty simple: basically just setting the property for the new word position. I got some peculiar results when trying to do some string manipulation for a property, because the method was being called before a prior execution to set it was completed.

Lastly, I encountered an odd, and as yet unresolved bug when, as an afterthought, I added support for speech based on text input directly into a UITextField. I started seeing log errors of the type _BSMachError: (os/kern) invalid name (15). This appears to be quite a common one, as the number of upvotes on this question attest to. I’m filing it in the same bucket as the stateful play button resolution: if the quality of the playback warranted it, I’d figure it out.


I was in two minds as to whether or not to write this app up given the mixed results, but I thought I might as well as it may be one of my last pieces of Objective C development.

I blogged a year ago, almost to the day, about my first Apple Watch app. I had one pretty serious watch project last summer, which I ended up canning around November. The idea was to sync accelerometer data from the watch with slow motion video, say of a golf swing. However, I ran into a problem with the precision of the data in AVAsset metadata, and which I have an as-yet unanswered question on in StackOverflow.

I also ran into the same usage issues with the watch that have been widely reported in the tech press. While I really liked the notifications [and the watch itself, which is a lovely piece of kit], the absence of any other compelling functionality barely warranted the bother of charging the thing. The borderline utility really came to the forefront for me when I was travelling, for both work and holidays, with no roaming network access. The watch stayed at home.

I also think it’s pretty telling that I bought precisely zero apps for it before I sold it a couple of months ago.

I’ll be looking to replace my current iPhone 6 Plus later this summer and I’m toying with the idea of moving to Android. I tried it before and hated it, but that was with a pretty low-end phone that I got as a stopgap after my 4S had an unfortunate encounter with the washing machine. Java would be much more useful than Objective C in my working life, and it’s a potentially interesting route into the language.

Building an Electronic Programme Guide [part 3]

This is my third and final write-up on the development of an electronic programme guide app. As of part 2, the main scrolling view with the programme details, scaled to length, are displayed.

Next up I wanted to have the ability to display programme details. This seemed like it was going to be pretty straightforward: when building the per-programme view, I included a button, which is transparent, and has the same dimensions as the view itself. The first problem is associating the programme details with the button itself. There is plenty of discussion on StackOverflow about how to do this in the least offensive way. I went with a category and then object association. This allowed me to set the GUID as a property for the button.

I rebuilt the app, hit the button and… entered into a fortnight of debugging an EXC_BAD_ACCESS error. I knew what the problem was: the ARC memory management was dereferencing the button object once it was set. I tried lots of different options, such as adding the buttons to an array, set with various properties, and passing the array back to the main view controller. Nothing worked until I did more reading around the property attributes, and ended up redefining the Interface Builder defaults for the scrolling contentView to:

@property (nonatomic, strong) IBOutlet UIView *contentView;

That ‘strong’ means that everything in the view is held in memory. It has to be said that the app is very heavy on memory – as a direct consequence of that view object retention. It routinely occupies 63Mb in my testing.

Next up is the popup that is rendered. So finding the programme itself is pretty easy, using an NSPredicate based on the GUID. What proved a bit harder to deal with is if the main view [the ‘contentView’ for the scrollView] is zoomed. As you have to add the popup view to the zoomed parent, the former is going to inherit the zoom setting. I couldn’t think of an elegant way around this so I worked around it in stages. First off, the popup sits on a blurred view of the current background:

// This is quite neat: make a CGRect of the currently visible part of the scrollview:
CGRect visibleRect = [scrollView convertRect:scrollView.bounds toView:contentView];
visualEffectView = [[UIVisualEffectView alloc] initWithFrame:visibleRect];
visualEffectView.effect = blurEffect;
visualEffectView.frame = contentView.bounds;
[contentView addSubview:visualEffectView];

Next, I register the scrollView offset in a property:

scrollOffSet = scrollView.contentOffset;

…set the zoomScale to 1, and disable the ability to zoom:

[scrollView setZoomScale:1.0];
scrollView.scrollEnabled = NO;
scrollView.maximumZoomScale = 1.0;
scrollView.minimumZoomScale = 1.0;

Placing the programme details subview is then relative to the currently visible rectangle:

float xForLittleView = visibleRect.origin.x + 30 ;
float yForLittleView = visibleRect.origin.y + 100;

CGRect progViewRect = CGRectMake(xForLittleView, yForLittleView, 350, 500);

I then have to undo the various view settings when the button to dismiss the view is touched:

[visualEffectView removeFromSuperview];
[littleView removeFromSuperview];
scrollView.scrollEnabled = YES;
scrollView.maximumZoomScale = 2.0;
scrollView.minimumZoomScale = 0.8;
[scrollView setZoomScale:zoomScale];
[scrollView setContentOffset:scrollOffSet];

It’s all a bit clunky, but it works. I imagine that this sort of interface plumbing actually happens quite a lot behind the scenes. That said, I may have missed a trick to do it in an easier way.

I’ll call out two more details that I wrestled with. The first is a search facility on the programme title. I wanted the NSPredicate to support as many search terms as the user entered. My initial idea was to split the UITextField input on spaces, and then loop through the resulting array, appending to a stringWithFormat, where all but the first element would be in the form:

AND (title CONTAINS[c][/c] %@)

Having experimented with this, it appears that predicateWithFormat has to have the actual string passed to it, as opposed to a variable containing the string. Which I have to say strikes me as a little odd. The functional upshot of this is that I couldn’t support a variable number of search terms. I support up to three, and construct a separate predicateWithFormat for each possibility.

One final problem that I couldn’t find a fix for was implementing a UITableView’s delegates in a class that I pass the view into as a parameter. I couldn’t find a way of getting the cellForRowAtIndexPath delegate method to be called. The conclusion I came to with this was that it was setting the delegate to self, when ‘self’ was the custom object, rather than the view. It was largely a cosmetic thing [I’ve noticed that for complicated apps, I have a tendency to pile way to much code into the main viewController] so it was easily solved.

Here’s what may be the final version of the app looks like, showing a search result in the popup view, and the to/from dates for the EPG coverage:

Search Results

Search Results

The other buttons that I haven’t talked about explicitly are an ability to switch between days, and initiate a download of EPG data – but which are pretty straightforward. What’s still either ugly or hasn’t been fully implemented is the download progress indicator, and also the what’s-on-now quick look on the Apple Watch, as I want to have a mess around with something completely unrelated to this app: the motion detection capability.

I did add a quick fix to ‘justify’ the right hand side of the ‘table’ of programmes. Formerly they were falling off the contentView. I simply check if the rightmost width of the cell is going to be greater than the width of the contentView. If so, I set it to be the same as the width.

So that’s it. I have a pretty serviceable EPG app, which I use myself over the ad-funded variant I had before, which I guess is a fair indicator of utility. Main lesson learned: not knowing what those property attributes meant tripped me up really badly!

Building an Electronic Programme Guide [Part 2]

A little later than intended, here’s the follow-up to my first posting on building an EPG.

So, having marshalled the data into a structure that can be displayed, on to the guts of this app, which is the main UIView for guide itself. The pre-requisite of that main view is configuring the scrollView: I’m not going to dwell on this too long, as it’s a well documented feature.

For the formatting of the content in that classic programme guide UI, with rows of variable width cells, I initially started with looking at collectionViews, but couldn’t find a way of doing it – or certainly an amenable way. I also briefly looked at customising a tableView, which I could add multiple elements to, but rejected it for the same reason.

I finally settled on a custom implementation, with each programme being represented by its own UIView, with:

  • the width: the duration in minutes divided by the number of minutes in the day, then multiplied by the width of the scrollview.
  • the height: this is a constant for all the cells, just based on experimenting with the scrollView.

For what it’s worth, my scrollView is 4000, and the row height is 60. [Actually it’s the width of the contentView the scrollView contains, but I’ll refer to the scrollView from here for purposes of readability.]

Before I go on to calculating the [X,Y] coordinates of the top left corner of the subview, I need to recap on how the programme information is written to Core Data and then read back. At some point in the implementation as I envisage it, it will be possible for the user to both download more programme data, as only a couple of weeks worth come down at a time, and also to configure which channels to do the download for. For ease of simplicity of managing the stored data, I decided to write all of the programme data to the same Core Data entity [think database table], and when the user repeats the download, with or without changing the channel configuration, I delete the existing data – a fairly blunt but effective instrument.

When it comes to retrieving the data, which I do on a per-channel basis, I need to search using an NSPredicate:

NSPredicate *predicate = [NSPredicate predicateWithFormat:@"(channel = %@) AND (startDate >= %@) AND (endDate <= %@)", channelName, startDateForSearch, endDateForSearch];
NSFetchRequest *allChannelDataReq = [[NSFetchRequest alloc] init];
[allChannelDataReq setEntity:[NSEntityDescription entityForName:@"Programme" inManagedObjectContext:managedObjectContext]];
[allChannelDataReq setPredicate:predicate];
[allChannelDataReq setIncludesPropertyValues:NO];
NSError *error = nil;
tempArrayOfFilteredProgData = [managedObjectContext executeFetchRequest:allChannelDataReq error:&error];

So this block of code is searching for hits on a given channel, startDate and endDate [attributes of the Entity], and loading them into the array. I’ve already noted these are unsorted, so…

NSSortDescriptor *sortByRunningOrderInt = [[NSSortDescriptor alloc] initWithKey:@"progOrderNum" ascending:YES];
NSArray *descriptors = [NSArray arrayWithObject:sortByRunningOrderInt];
NSArray *sortedProgDataForChannel = [channelData sortedArrayUsingDescriptors:descriptors];

…where the progOrderNum is the corresponding attribute in the Entity.

The sortedProgDataForChannel can now be looped through, for each programme:

- (NSMutableDictionary *)drawRect:(float)width startXPosn:(float) topLeftX rowNum:(int)rowNum forColour:(UIColor *)cellColour
{
NSMutableDictionary *viewAndCoords = [[NSMutableDictionary alloc] init];
float topLeftY = rowNum * rowHeight;
float topLeftXForNextView = topLeftX + width;
CGRect rectangle = CGRectMake(topLeftX, topLeftY, width, rowHeight);
UIView *thisProgView = [[UIView alloc] initWithFrame:rectangle];
thisProgView.backgroundColor = cellColour;
thisProgView.layer.borderColor = [UIColor blackColor].CGColor;
thisProgView.layer.borderWidth = 1.0;
[viewAndCoords setObject:thisProgView forKey:@"view"];
[viewAndCoords setObject:[NSNumber numberWithFloat: topLeftXForNextView] forKey:@"newTopLeftCoord"];
return viewAndCoords;
}

The idea here is to calculate the width, as described above, and create the UIView [with a couple of frills like a border], and then return the X co-ordinate for the starting point for the next view. The Y coordinate is a simple calculation: I assign the channels a number, and then multiply that by the height of each cell.

That, in essence is the primary logic for building the ‘wireframe’ for the programmes. I extract the text for a given programme’s title, start and end times, etc, and add those as labels programmatically. Again, these are calculated as offsets from the starting X coordinate and the calculated Y based on channel number.

There are a couple of gotchas, and I had to take a couple of bites at this – which is a polite way of saying I made a couple of idiotic mistakes – and the visual results of which were so bad that I decided to record them for posterity. The first is an example of not reading the documentation. I misremembered the parameters for a UIView as being X+Y for the top left corner, and then X+Y for the bottom right corner. This resulted in the following:

IMG_3084

It really is a thing of beauty :).

Next up was dealing with the simple matter of the start of the day. The NSPredicate I use extracts per-channel programme data for start times on a given date. I was then merrily plastering these onto the scrollView with the following results:

IMG_3098

Obviously a refinement is required, so first port of call was to make 5am the equivalent of the left edge of the scrollView. I decided that, rather than hardwiring this into the search parameter for the NSPredicate – in case I changed my mind later – I’d do it when looping through the channel data.

So first of all, I see if the start time of the given programme is >= 5am. If not I disregard it. If it is, I then calculate the offset for the X coordinate. This is based on calculating the difference between the start time of the programme and 5am in minutes. I then divide this by the number of minutes in the day and multiply it by the width of the scrollView. There’s an additional offset [which actually applies to all of the programme UIViews] for a channel title list down the left hand side of the scrollView.

-(int) calcPosOfFirstProg:(NSDate *)progStartTime
{
int xPos = 0;
// This start x posn allows for the channel name label
int offSetForChanTitle = 100;
// So: the left edge of the view equates to 5am.
NSDateFormatter *dateFormatter = [[NSDateFormatter alloc] init];
// formatting from just HH:mm didn't work because the progStartTime has a date.
// need to extract today's date from progStartTime and bake it into the viewEdgeTime...

[dateFormatter setDateFormat:@"dd/MM/yyyy"];
NSString *dayDateForDisplayDay = [dateFormatter stringFromDate:progStartTime];
NSString *viewEdgeTimeString = [NSString stringWithFormat:@"%@ 05:00", dayDateForDisplayDay];
[dateFormatter setDateFormat:@"dd/MM/yyyy HH:mm"];
NSDate *viewEdgeTime = [dateFormatter dateFromString:viewEdgeTimeString];
// Time interval gives us the number of seconds:
NSTimeInterval timeDiff = [progStartTime timeIntervalSinceDate:viewEdgeTime];
// All other calculations on positioning have been in minutes so:
float minsBetweenTimes = timeDiff / 60;
NSLog(@"minsBetweenTimes is %f", minsBetweenTimes);
float minutesInDay = 1140;
xPos = ((minsBetweenTimes / minutesInDay) * scrollViewWidth) + offSetForChanTitle;
return xPos;
}

That channel listing is a bit ugly – it’s why I bake the channel name into the progamme summary. What I’d really like to do is have a “floating” listing that the scrollView passes under. I’ve used key value observers a lot in the past, so one approach would be to set a KVO on the delegate that returns the current scroll scale value, and to resize based on the changes in scale. One to come back to…

It would, of course, be necessary to do the same for the right hand of the scrollView, to make sure that for a given cell, there’s enough room to display it. I haven’t gotten round to this yet, so late finishing programmes dangle precariously into space.

Next up I’ll talk about creating the pop-up view for the programme details, including navigating an ARC related memory error, which took me a loooooooong time to figure out!