How does the Twitter app know my location?

[Edit: the answer to this is actually very simple. I’d been focusing so much on the various mobile frameworks that I forgot about the obvious one: it’s my source IP. There are services out there that take your IP address and, based on service provider information, turn it into a location. The same mechanism is going to be available whether you are using your browser – and where you will see commonly see location specific ads – or if you’re using an app.]

OK, maybe not the most thrilling of titles, but I’ve been interested in location services pretty much since I started developing for iOS, and something happened this morning while using the Twitter app that piqued my interest.

I followed a link to a site, which popped up an embedded browser – so a UIWebView. At the bottom of the article, amid the usual rubbish, was an link to another site telling me how a millionaire in my named home town was earning an unlikely amount working from home.

The thing is that in my Location Services settings, I have the Twitter app set to ‘never’. There are a couple of other possible candidates. I was wondering if the UIWebView was inheriting some settings from Safari, but I have the Safari Websites setting on ‘never’. Also, the call to start the Location Manager happens in the calling app – so the corresponding privacy setting should be in the context of that app.

Looking at the other System Settings under Locations Services, there’s one other candidate: iAds. I’ve not used this in my own apps, but I’ve just checked: they are views embedded in native apps, not in UIWebViews. And anyway, I have the setting disabled.

There are a few other System Settings that I have set to ‘on’, such as WiFi Networking and Location-Based alerts, none of which should have anything to do with the Twitter app.

So what’s going on? Wild conspiracy theories aside, I can’t understand how the app could be getting my location when the primary privacy setting for the app is ‘never’.

PDF to Text Conversion

This project combines a couple of well trodden paths: PDF to text conversion, and then running an app in the background with audio playback. It introduced some new concepts to me and, based on a trawl of the usual resources for problem-solving, at least a couple of issues that are worth recording.

The TL;DR version is that PDF parsing gets into pretty complicated territory – unless you happen to know C well. There are open source libraries out there, but they didn’t hit the mark for me. I’ve implemented my own parser which is crude, but works. More or less!

Any PDF manipulation in iOS is going to depend on the Quartz 2D library somewhere along the line. Whether you call it directly or rely on another API that wraps it is a matter of choice. I looked at a couple. PDFKitten has a lot of functionality and seems to be by far the most sophisticated open source library but the documentation didn’t cover the simple requirement that I had – text extraction. There’s another one called pdfiphone that I struggled to get to work, and which epitomises the main challenge that I had with this project: I have only a rudimentary knowledge of C, which is what you’re getting into with Quartz.

So the basic structure of a PDF is a series of tags associated with different types of payload. You break the document down into pages, and process each page as a stream, calling C functions associated with the tags. This is a simple adaptation of example code straight from the Quartz documentation:

for (NSInteger thisPageNum = 0; thisPageNum < numOfPages; thisPageNum++)
{
   CGPDFPageRef currentPage = CGPDFDocumentGetPage(*inputPDF, thisPageNum +1);
   CGPDFContentStreamRef myContentStream = CGPDFContentStreamCreateWithPage (currentPage);
   CGPDFScannerRef myScanner = CGPDFScannerCreate (myContentStream, myTable, NULL);
   CGPDFScannerScan (myScanner);
   CGPDFPageRelease (currentPage);
   CGPDFScannerRelease (myScanner);
   CGPDFContentStreamRelease (myContentStream);
   CGPDFOperatorTableSetCallback(myTable, "TJ", getString);
}

In the last line I call my own C function ‘getString’ when the stream encounters the tag “TJ”. Here’s the first part that was new to me: the blending of C and Objective C. My function call, which is an adaptation of code I found here, looks like this:

void getString(CGPDFScannerRef inScanner, void *userInfo)
{
   CGPDFArrayRef array;
   bool success = CGPDFScannerPopArray(inScanner, &array);
   for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 1)
   {
      if(n >= CGPDFArrayGetCount(array))
      continue;
      CGPDFStringRef string;
      success = CGPDFArrayGetString(array, n, &string);
      if(success)
      {
         NSString *data = (__bridge NSString *)CGPDFStringCopyTextString(string);
         [globalSelf appendMe:data];
      }
   }
}

So there a couple of things going on here: this code is simply looking for a string – well a Quartz style CGPDFStringRef – in the payload passed in by stream process. If it finds one, it converts it into an NSString via some ‘bridge casting’ – something I’ve come across before in working with the keychain, and which you need for ARC compliance. I then take that string and append it to a property in a local method called appendMe.

It’s not possible to call ‘self’ from a C function. There are a number of possible ways around this, some of which get pretty nasty. The most elegant that I found was this:

static P2VConverter *globalSelf;

-(void)setMyselfAsGlobalVar
{
   globalSelf = self;
}

…which assigns an instance of the class that I created to do the PDF processing to a static variable called *globalSelf, and which I can then refer to as an alternate to self. To say this implementation isn’t particularly memory efficient is an understatement – but it works.

There is a rich set of tags defined by a published PDF standard – all 800 pages of it – that tell whatever is rendering the document what to do with it. The best general explanation I found was this. There is a relatively small set of text related tags and TJ seems to be the simplest. It’s also the only one that I was able to adapt from other examples. I may come back to this again.

The way I tested this was to convert an HTML page into a PDF using Safari. The more complicated the text structure in your input document – say multiple columns per page, text boxes etc – the worse this simple extraction mechanism is going to cope.

On to email based importing of files. This isn’t something that I’d ever looked at before and it turned out to be a little more complicated than I expected. The amendments to the info.plist are pretty trivial, creating the association between the file type and the app. So in the PDF reader, when you launch the app in the contextual menu, what actually happens is that the file is copied to the app’s Documents folder, and a file:// style URL which points at it is passed to the AppDelegate – specifically, the application:(UIApplication *)application handleOpenURL method. I’d assumed in the first instance, that I’d import the header for the viewcontroller into the AppDelegate – and this is a single view app, so ViewController.h – instantiate the VC, call a method I expose in the header and I’d be done:

ViewController *thisVC = [[ViewController alloc] init];
[thisVC importFromEmail:url];

This is wrong, and led to some peculiar side effects, which emerged when I started to try to set the point in the text to resume speech to reflect a change in the scrubbing control. This is what I came up with, which is a variant of this, adapted for a single view app:

UIStoryboard *storyboard = [UIStoryboard storyboardWithName:@"Main" bundle:nil];
UINavigationController *root = [[UINavigationController alloc]initWithRootViewController:[storyboard instantiateViewControllerWithIdentifier:@"EntryVC"]];
self.window.rootViewController= root;
ViewController *thisVC = (ViewController*)[[root viewControllers] objectAtIndex:0];
if (url != nil && [url isFileURL]) 
{
   [thisVC importFromEmail:url];
   NSLog(@"url: %@", url);
}

A couple of points of note. First, the method being called here is going to run before  viewDidLoad or viewWillAppear. I normally do various inits in viewWillAppear, so I put them in a method that I call immediately in importFromEmail. Second, the string value for instantiateViewControllerWithIdentifier needs to be set in the storyboard.

Apart from the nasty callouts to C, what I spent most of my time working on scrubbing control functionality. I’ve created an IBAction method that will be called when the scrubber moves position. In the storyboard, I set the maximum value of the control to be 1, so to get the index of the new word position after dragging the control, I multiply the fraction that moving the control allows me to reference in the IBAction by the length of the original pdf string length.

Having stopped – not paused; more on that in a second – the playback, I then start a new playback in the IBAction method for the play button, having created a new string based on a range: the word index from the scrubber control as the start point, and then the length by subtracting that from the original pdf string length. There was a little bit of twiddling necessary to support this so that it would work when called multiple times.

Part of the reason I took this approach was because the continueSpeaking method on the AVSpeechSynthesizer class didn’t seem to work. This was because I was using stopSpeakingAtBoundary instead of pauseSpeakingAtBoundary – something I’ve just noticed. Doh!

This has a knock-on effect, which is that the play button has to be stateful, with a continue if the pause was pressed, or a restart with the new substring if it’s because of the setting of the scrubber control. Given that the actual quality of the string conversion is pretty basic, the fix for this exceeds the usefulness of the app.

A couple of final comments. I discovered that it’s best to keep the functionality in the IBAction method called for the scrubbing control changes pretty simple: basically just setting the property for the new word position. I got some peculiar results when trying to do some string manipulation for a property, because the method was being called before a prior execution to set it was completed.

Lastly, I encountered an odd, and as yet unresolved bug when, as an afterthought, I added support for speech based on text input directly into a UITextField. I started seeing log errors of the type _BSMachError: (os/kern) invalid name (15). This appears to be quite a common one, as the number of upvotes on this question attest to. I’m filing it in the same bucket as the stateful play button resolution: if the quality of the playback warranted it, I’d figure it out.


I was in two minds as to whether or not to write this app up given the mixed results, but I thought I might as well as it may be one of my last pieces of Objective C development.

I blogged a year ago, almost to the day, about my first Apple Watch app. I had one pretty serious watch project last summer, which I ended up canning around November. The idea was to sync accelerometer data from the watch with slow motion video, say of a golf swing. However, I ran into a problem with the precision of the data in AVAsset metadata, and which I have an as-yet unanswered question on in StackOverflow.

I also ran into the same usage issues with the watch that have been widely reported in the tech press. While I really liked the notifications [and the watch itself, which is a lovely piece of kit], the absence of any other compelling functionality barely warranted the bother of charging the thing. The borderline utility really came to the forefront for me when I was travelling, for both work and holidays, with no roaming network access. The watch stayed at home.

I also think it’s pretty telling that I bought precisely zero apps for it before I sold it a couple of months ago.

I’ll be looking to replace my current iPhone 6 Plus later this summer and I’m toying with the idea of moving to Android. I tried it before and hated it, but that was with a pretty low-end phone that I got as a stopgap after my 4S had an unfortunate encounter with the washing machine. Java would be much more useful than Objective C in my working life, and it’s a potentially interesting route into the language.

I’m sure it’s nothing personal…

Like probably the vast majority of people who are running WordPress for more than a few months, my site is frequently being hit with automated attacks. I’ve only recently noticed this in my logs so I thought it would be interesting to have a closer look.

Around the turn of the year, for reasons I can’t recall, I happened to look at the raw access logs and noticed a lot of references to ‘xmlrpc.php’, which look like this:

142.4.4.190 - - [31/Jan/2016:18:13:42 +0000] "POST /blog/xmlrpc.php HTTP/1.0" 200 58043 "-" "-"

This is a real log file entry, and is a classic example of an XMLRPC bruteforce amplification attack: someone has posted 58k at this page, to try and bruteforce the admin password. I disabled the mechanism – and just verified that it’s working this morning [two months later :)], as the 200 server response is a bit more polite than I would have expected.

At the same time I installed [yet another] plugin, which rate limits failed admin password authentication attempts. It started triggering last week with repeated admin authentication failures from a machine in Hanoi. In my latest access log file [31st January to about half an hour ago], I have 1500 POST attempts which look like this:

123.30.140.199 - - [26/Feb/2016:13:37:47 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 3766 "-" "-"

I’ve not paid much attention to log formats in a long time so I had to google what those final two hyphens are: a blank referer [note to my wife on the spelling :)] and user agent field respectively. The blank user agent is indicative of some sort of automated attack and, by virtue of the fact that the person who’s running it hasn’t even bothered to make it look like a real browser, one that isn’t particularly sophisticated.

The logging pattern suggests what you’d expect: someone has harvested a set of servers that are running WordPress [how? by virtue of having the common pages that WordPress hosts. So a 200 in response to a GET for a ~/wp-login.php page, for instance], and is stepping through them.

This is another indicator of the lack of sophistication:

123.30.140.199 - - [26/Feb/2016:16:41:35 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:41:37 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:41:38 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:41:44 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:41:46 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:41:58 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:41:59 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:05 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:06 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:13 +0000] "POST /blog/wp-login.php HTTP/1.0" 200 1643 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:14 +0000] "POST /blog/wp-login.php HTTP/1.0" 403 9 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:20 +0000] "POST /blog/wp-login.php HTTP/1.0" 403 9 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:21 +0000] "POST /blog/wp-login.php HTTP/1.0" 403 9 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:22 +0000] "POST /blog/wp-login.php HTTP/1.0" 403 9 "-" "-"
123.30.140.199 - - [26/Feb/2016:16:42:23 +0000] "POST /blog/wp-login.php HTTP/1.0" 403 9 "-" "-"

What’s happening here is that some software I’m running is blocking the user’s IP address after 10 authentication failures, shown by the 403, which is the server returning a ‘Forbidden’. What I’ve deleted from the log extract above is  that there are a total of 25 Forbidden responses by the server in a row: the attack software isn’t checking the server response codes, which is a waste of resource on their part.

I’ve had a bit of a trawl through my logs and am seeing similar, albeit less determined attacks like this, coming from all sorts of far flung places:

62.109.19.98 - - [13/Feb/2016:07:46:48 +0000] "POST /blog/xmlrpc.php HTTP/1.0" 200 58043 "-" "-"

That’s another XMLRPC bruteforce amplification attack, from Russia. A geolocation site reckons this one…

204.232.224.64 - - [12/Feb/2016:07:12:33 +0000] "POST /blog/xmlrpc.php HTTP/1.0" 200 58043 "-" "-"

…is in San Antonio, Texas. Interesting that the byte sizes being posted through are identical: 58,043. Again, that’s indicative of the same off the shelf attack software running with a pre-canned payload. Let’s do one more of these:

1.83.251.239 - - [11/Feb/2016:02:19:14 +0000] "POST /blog/xmlrpc.php HTTP/1.1" 200 45387 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"

I can honestly say that since I first started messing around on the internet in 1992, I’ve never seen an IP address that starts with 1. The geolocation service dutifully informs me that the machine that sent this parcel of good intention is located in Xi’an in China. At least they’ve spiced things up a bit with a different sized payload.

So here’s a thing: I have a couple of blog posts on this site about a holiday we had in Vietnam. I blogged about a holiday to China that included a trip to Xi’an. I’ve also got a posting about a work trip to Russia. So… Russia and China are massive, populous countries. But Xi’an, in China? That looks like a pattern to me. I wonder if the bundle of joy – malware, whatever it is – that would be deposited on my site if it were to be compromised is tailored or localised in some way or other, based on the occurrences of those locations.

 

As per the title, and the obvious lack of finesse, I know that my server is just one on what’s probably a very long list of candidates that these automated attacks are hitting. WordPress has had something of a chequered history from a security point of view: it’s a natural target. While I’ve done the easy stuff to shore it up – like blocking a blank user agent – the options are relatively limited. That’s fine, given the fairly low-rent nature of the stuff being thrown at it, but I’d really prefer not to be distributing malware to people. Migrating off WordPress looks like it would be a pain so if the ancillary approaches start to look like they’re too much trouble I’ll just delete the site.