Sign in

Google Cookbook - Google App Engine

Posted by skibaa on Sat 03 Jan 2009 in Datastore
Extend ReferenceProperty so the collection would not be fetched every time you iterate though it. So instead of returning a query the back reference modelname_set property will return the whole collection from memory.
class Master(db.Model):
pass
class Detail(db.Model):
master=CachedReferenceProperty(Master)
m=Master()
m.save()
d=Detail(master=m)
d.save()
m.detail_set #fetches and returns [d]
m.detail_set #returns cached value, no DataStore access
delete m.detail_set #resets the cached value
m.detail_set #value fetched again

Please notice that in the returned collection the hidden attribute __RESOLVED_master is already set to the same instance of Master, so you can go back and forth like this effectively:
m.detail_set[0].master.detail_set[-1].master

Similar code would fetch both Master and Detail again and again with original ReferenceProperty.
Posted by dw@botanicus.net on Tue 30 Dec 2008 in Datastore
This property uses a simple scheme to reduce the footprint of a list of positive integers.
>>> myList = [1,2,4,6,8,16,32,64,128,256,2048,8192,16384,32768,65536,131072] * 1000
>>> len(encode(myList))
28000
>>> len(array.array('L', myList).tostring())
64000



The encoding uses the 8th bit of each byte to signal whether consecutive bytes contain more bits for the current integer, thus leaving 7 bits available for the actual data. This means integers < 128 require 1 byte, < 16384 require 2 bytes, etc. The list's length is also implicit in the encoding; decoding simply continues reading new numbers until there is no more data. A side effect of this is that 0-sized lists take up almost no space.

While the property can handle arbitrarily sized numbers, it only provides a meaningful win when your data may require a large value range, while typically using a smaller range. A contrived example of where this might be true is a music file format, where the delay between 2 events is usually small, however occasional long pauses in the track might introduce large values.

Even if this scheme doesn't immediately fit your needs, it may be possible to massage your data to work with it. For example, when encoding keyframe positions in a video file, rather than storing the absolute timestamps, which will always increase, instead store the distance between each keyframe, which will result in smaller numbers.

For a more CPU efficient class that supports a wider range of types with predictable storage size, check out Store arrays of numeric values efficiently in the datastore.
Posted by rodrigo.moraes on Mon 29 Dec 2008 in Images API
The Images API doesn't provide a very basic information: the width and height of the loaded image. This info is needed too commonly to be missing. For example, we may not want to create a thumbnail if an uploaded image is already small enough, or we may not want to resize an image to a size bigger than the original one.

While they don't add a way to get the image dimension, we can use this function, extracted from pyib (http://code.google.com/p/pyib-standalone/):

Edit:
The cookbook completely messed with the code I posted, so please grab the function getImageInfo() directly from the file:

http://code.google.com/p/pyib-standalone/source/browse/trunk/img.py

These are the modules you'll need to import:

import struct
from StringIO import StringIO


It works for GIF, PNG16, PNG24 and JPEG.
Posted by rodrigo.moraes on Mon 29 Dec 2008 in Images API
We need the Python Image Library (PIL) to use the Images API in the dev server, and although it is available through macports or darwinports, I could not get it working using a port. It is easy enough to compile the module yourself, though. Do the following:

1. Make sure you already have all the dependencies (freetype, jpeg, zlib). If not, install them first, using your favorite port manager.
2. Download PIL from http://effbot.org/downloads/Imaging-1.1.6.tar.gz
3. Unpack and install the library:
tar -xzvf Imaging-1.1.6.tar.gz
cd Imaging-1.1.6.tar.gz
sudo python setup.py install


That's it.
Posted by dsears on Thu 25 Dec 2008 in Datastore
Here's a method to get or create an entity based on a unique set of properties:

from google.appengine.ext import db

class SuperModel(db.Model):
@classmethod
def get_or_insert_by(cls, parent=None, **kwds):
query = db.Query(cls)
if parent is not None:
query.ancestor(parent)
for kw in kwds:
query.filter("%s =" % kw, kwds[kw])
entity = query.get()
if entity is not None:
return entity
entity = cls(parent, **kwds)
return entity

class Movie(SuperModel):
name = db.StringProperty()
year = db.IntegerProperty()

movie = Movie.get_or_insert_by(name="Magnolia", year=1999)
if not movie.is_saved():
movie.put()



Note that new entities are returned unsaved. This gives you a chance to set additional properties before the initial put. Alternately, if you just want to do input validation for new entities, is_saved() == True can be used to catch pre-existence.

Be careful; duplication can still occur under high concurrency, or if other code creates entities. This does not add a unique constraint to the model.
Posted by albrecht.andi@googlemail.com on Mon 15 Dec 2008 in Datastore
Sometimes you want to know what actually happens on the datastore when running an application.
Especially when dealing with pythonic database models you use their attributes and methods without realizing at first glance that they issue a bunch of datastore queries behind the scenes.

For WSGI-based App Engine applications you can use the profiling module described in this recipe to get a clue what actually happens on the datastore.


Using the Profile Module

To enable datastore profiling import the attached profiling module in your main application module (usually called main.py) and replace your main function with the debug_main function provided by the profile module.

Example main.py:
  [...]
def main():
run_wsgi_app(application)
# Uncomment the following two line to enable datastore profiling:
#import datastoreprofile
#main = datastoreprofile.debug_main(main)
[...]



Output Channels

The profiling module provides three different output channels:

'memcache' - View profiling data on a separate page.
'inline' - Adds profiling data at the bottom of a page (default).
'log' - Writes profiling data using the logging framework.

To change the output channel, set the OUTPUT_CHANNEL variable after importing the profiling module. For example:
import datastoreprofile
datastoreprofile.OUTPUT_CHANNEL = 'inline'


Using the 'inline' channel is the easiest way to view profiling data, but since this module collects a lot of data, the resulting page size could easily hit the 1MB limit.

So, using the 'memcache' channel is the recommended way to view profiling data. To enable the separate page to view collected data add
- url: /datastoreprofile/.*
script: datastoreprofile.py


to your app.yaml and open e.g. http://localhost:8080/datastoreprofile/ in your browser.

When using the 'log' channel open the log viewer, set "Minimum Severity" to "Debug" and use "^DATASTORE:" as filter to view collected data.


Known Bugs

If the application reloads when running with SDK (dev_appserver.py) some parts of the profile module get messed up. As a workaround just restart the SDK server.


Links

Source code: http://git.andialbrecht.de/gitweb/?p=gae-datastoreprofile.git
A blog post about this module: http://andialbrecht.blogspot.com/2008/12/profiling-datastore-access-on-app.html
Posted by rodrigo.moraes on Sun 14 Dec 2008 in Datastore
Here's a property to stored objects. It pickles/unpickles the object automatically when saving or fetching:

from google.appengine.ext import db
import pickle

class PickledProperty(db.Property):
data_type = db.Text

def get_value_for_datastore(self, model_instance):
value = self.__get__(model_instance, model_instance.__class__)
if value is not None:
return db.Text(pickle.dumps(value))

def make_value_from_datastore(self, value):
if value is not None:
return pickle.loads(str(value))



To force a certain type to be stored, we would extend the class and define a type in force_type:

from google.appengine.ext import db
from google.appengine.api import datastore_errors
import pickle

class PickledProperty(db.Property):
data_type = db.Text
force_type = None

def validate(self, value):
value = super(PickledProperty, self).validate(value)
if value is not None and self.force_type and \
not isinstance(value, self.force_type):
raise datastore_errors.BadValueError(
'Property %s must be of type "%s".' % (self.name,
self.force_type))
return value

def get_value_for_datastore(self, model_instance):
value = self.__get__(model_instance, model_instance.__class__)
if value is not None:
return db.Text(pickle.dumps(value))

def make_value_from_datastore(self, value):
if value is not None:
return pickle.loads(str(value))



An extended property would be like this:
# the object to be pickled and stored
class MyObject(object):
...

# the property that forces values to be of type MyObject
class MyObjectProperty(PickledProperty):
data_type = db.Text
force_type = MyObject

Posted by rodrigo.moraes on Thu 11 Dec 2008 in Datastore
Arachnid suggested this property in the IRC channel and I adapted the previous NamedBooleanProperty to be more generic and extensible: EnumProperty can be set using a string value, and it returns the string value when read, but it stores the index of the choices list in the datastore.

from google.appengine.ext import db

class EnumProperty(db.Property):
"""
Maps a list of strings to be saved as int. The property is set or get using
the string value, but it is stored using its index in the 'choices' list.
"""
data_type = int

def __init__(self, choices=None, **kwargs):
if not isinstance(choices, list):
raise TypeError('Choices must be a list.')
super(EnumProperty, self).__init__(choices=choices, **kwargs)

def get_value_for_datastore(self, model_instance):
value = self.__get__(model_instance, model_instance.__class__)
if value is not None:
return int(self.choices.index(value))

def make_value_from_datastore(self, value):
if value is not None:
return self.choices[int(value)]

def empty(self, value):
return value is None



Refactoring the GenderProperty, we would do:
class Profile(db.Model):
...
gender = EnumProperty(choices=['male', 'female'])

profile = Profile(gender='female')
profile.put()


It stores int 1 in the datastore.
Posted by rodrigo.moraes on Tue 09 Dec 2008 in Datastore
And the previous GenderProperty recipe leads to another one, NamedBooleanProperty: the idea is to store a boolean value but set or retrieve using a string. So we define a property passing a list with two string correspondent to True and False. Using the gender property example, we would do:
class Profile(db.Model):
...
gender = NamedBooleanProperty(values=['female', 'male'])

entity = Profile(gender='female')
entity.put()


Now the property only accepts one of the values in the 'values' list, but saves it as boolean. The property value when retrieved from the datastore will be the correspondent string in the list.


The GenderProperty can also be refactored to be:
class GenderProperty(NamedBooleanProperty):
"""
Stores a gender ('male' or 'female') as boolean. The property is set or
retrieved using the string.
"""
def __init__(self, **kwargs):
super(GenderProperty, self).__init__(values=['male', 'female'], **kwargs)



Here's the new property:
from google.appengine.ext import db
from google.appengine.api import datastore_errors

class NamedBooleanProperty(db.Property):
"""
Maps two strings to be saved as boolean. The property is set or
retrieved using the string value.
"""
data_type = bool

def __init__(self, values=None, **kwargs):
"""
Args:
values: A list with two strings corresponding to False and True, in
this order.
"""
if not isinstance(values, list):
raise TypeError('Values must be a list.')
elif len(values) != 2:
raise ValueError('Values must have length of 2.')

super(NamedBooleanProperty, self).__init__(**kwargs)
self.values = values

def validate(self, value):
value = super(NamedBooleanProperty, self).validate(value)
if value is not None and value not in self.values:
raise datastore_errors.BadValueError(
'Property %s must be "%s" or "%s".' % (self.name,
self.values[0], self.values[1]))
return value

def get_value_for_datastore(self, model_instance):
value = self.__get__(model_instance, model_instance.__class__)
if value is not None:
return bool(self.values.index(value))

def make_value_from_datastore(self, value):
if value is not None:
return self.values[int(value)]

def empty(self, value):
return value is None

Posted by rodrigo.moraes on Tue 09 Dec 2008 in Datastore
Based on the custom property idea from the recipe "Store arrays of numeric values efficiently in the datastore", I created a GenderProperty for models, so that I can store genders as bool but set/get using a string.

This recipe reduces genders to 'male'/'female' and is not intentionally sexist or intend to create any polemic.

from google.appengine.ext import db
from google.appengine.api import datastore_errors

class GenderProperty(db.Property):
data_type = bool
values = ['female', 'male']

def validate(self, value):
value = super(GenderProperty, self).validate(value)
if value is not None and value not in self.values:
raise datastore_errors.BadValueError(
"Property %s must be '%s' or '%s'" % (self.name,
self.values[0], self.values[1]))
return value

def get_value_for_datastore(self, model_instance):
value = self.__get__(model_instance, model_instance.__class__)
if value is not None:
return bool(self.values.index(value))

def make_value_from_datastore(self, value):
if value is not None:
return self.values[int(value)]

def empty(self, value):
return value is None



Now gender can be set/retrieved as a string:
class Profile(db.Model):
...
gender = GenderProperty()

entity = Profile(gender='female')
entity.put()